TheJacksonLaboratory / cs-nf-pipelines

The Jackson Laboratory Computational Sciences Nextflow based analysis pipelines
MIT License
18 stars 10 forks source link

Reference files for WGS #8

Closed lawtyunc closed 2 weeks ago

lawtyunc commented 1 month ago

Hello Jackson lab. I am currently running your WGS pipeline as were are getting errors at the WGS:GATK_HAPLOTYPECALLER_INTERVAL step. I believe it is due to the format of our Homo_sapiens_assembly38.primary_chrom.bed accept I am doing this for Grcm39 for the --primary_chrom_bed param and this Mus_musculus.GRCm38.dna.toplevel.primaryChr.contig_list for the --chrom_contigs param. I was using this to try and create this files but not sure if this is the best solution to try and create these files but I was wondering if you guys had references of these files for mouse GRCm39?

https://www.biostars.org/p/486600/

peter-d-fields commented 1 month ago

@lawtyunc are you trying to run the WGS pipeline on mouse or human data?

lawtyunc commented 1 month ago

mouse data


From: peter-d-fields @.> Sent: Wednesday, August 7, 2024 3:26 PM To: TheJacksonLaboratory/cs-nf-pipelines @.> Cc: Laws, Tyler @.>; Mention @.> Subject: Re: [TheJacksonLaboratory/cs-nf-pipelines] Reference files for WGS (Issue #8)

@lawtyunchttps://github.com/lawtyunc are you trying to run the WGS pipeline on mouse or human data?

— Reply to this email directly, view it on GitHubhttps://github.com/TheJacksonLaboratory/cs-nf-pipelines/issues/8#issuecomment-2274193811, or unsubscribehttps://github.com/notifications/unsubscribe-auth/BB62MS5ETBUKMMOBC4SDSJLZQJYE5AVCNFSM6AAAAABME4J7FSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENZUGE4TGOBRGE. You are receiving this because you were mentioned.Message ID: @.***>

peter-d-fields commented 1 month ago

@lawtyunc okay, if that's the case then you should be able to run the pipeline using the mouse GRCm39 reference builds by setting the gen_org parameter to mouse (this is the default) and the genome_build parameter to GRCm39.

lawtyunc commented 1 month ago

Do I still need to go and retrieve this reference files?


From: peter-d-fields @.> Sent: Wednesday, August 7, 2024 4:13 PM To: TheJacksonLaboratory/cs-nf-pipelines @.> Cc: Laws, Tyler @.>; Mention @.> Subject: Re: [TheJacksonLaboratory/cs-nf-pipelines] Reference files for WGS (Issue #8)

@lawtyunchttps://github.com/lawtyunc okay, if that's the case then you should be able to run the pipeline using the mouse GRCm39 reference builds by setting the gen_org parameter to mouse (this is the default) and the genome_build parameter to GRCm39.

— Reply to this email directly, view it on GitHubhttps://github.com/TheJacksonLaboratory/cs-nf-pipelines/issues/8#issuecomment-2274267834, or unsubscribehttps://github.com/notifications/unsubscribe-auth/BB62MSZWASS6VDXASX7G4JDZQJ5VJAVCNFSM6AAAAABME4J7FSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENZUGI3DOOBTGQ. You are receiving this because you were mentioned.Message ID: @.***>

peter-d-fields commented 1 month ago

@lawtyunc You will need each of the reference files, yes. Are there particular files you do not have presently have access to?

lawtyunc commented 1 month ago

Hey Peter, Mus_musculus.GRCm39.dna.primary_assembly.genome.bed Mus_musculus.GRCm39.dna.toplevel.primaryChr.contig_list

These are the files I have been struggling to find. Do you guys have these available? Is the contig list file called something else as default?


From: peter-d-fields @.> Sent: Thursday, August 8, 2024 9:52 AM To: TheJacksonLaboratory/cs-nf-pipelines @.> Cc: Laws, Tyler @.>; Mention @.> Subject: Re: [TheJacksonLaboratory/cs-nf-pipelines] Reference files for WGS (Issue #8)

@lawtyunchttps://github.com/lawtyunc You will need each of the reference files, yes. Are there particular files you do not have presently have access to?

— Reply to this email directly, view it on GitHubhttps://github.com/TheJacksonLaboratory/cs-nf-pipelines/issues/8#issuecomment-2275888860, or unsubscribehttps://github.com/notifications/unsubscribe-auth/BB62MS2G4FTBHR3FKFJ3AZDZQNZY3AVCNFSM6AAAAABME4J7FSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENZVHA4DQOBWGA. You are receiving this because you were mentioned.Message ID: @.***>

peter-d-fields commented 1 month ago

@lawtyunc We don't presently have a public repository to access these files but I can send them to you if you provide an email address.

peter-d-fields commented 1 month ago

@lawtyunc Two quick points. 1) You may have misidentified the files you need. I think you need (assuming your following our current pipeline specifications):

Mus_musculus.GRCm39.dna.primary_assembly.bed  
Mus_musculus.GRCm39.dna.primary_assembly.primaryChr.contig_list

2) The manner in which you're interfacing with Github issues has listed the following as your email address: ***@***.***. I assume this cannot be correct so you will need to reply in a different way so that the correct address is available to share the files.

lawtyunc commented 1 month ago

@peter-d-fields Would you be able to send me an email address that I an contact you at ?


From: peter-d-fields @.> Sent: Thursday, August 8, 2024 11:22 AM To: TheJacksonLaboratory/cs-nf-pipelines @.> Cc: Laws, Tyler @.>; Mention @.> Subject: Re: [TheJacksonLaboratory/cs-nf-pipelines] Reference files for WGS (Issue #8)

@lawtyunchttps://github.com/lawtyunc Two quick points. 1) You may have misidentified the files you need. I think you need (assuming your following our current pipeline specifications):

Mus_musculus.GRCm39.dna.primary_assembly.bed Mus_musculus.GRCm39.dna.primary_assembly.primaryChr.contig_list

  1. The manner in which you're interfacing with Github issues has listed the following as your email address: @.***. I assume this cannot be correct so you will need to reply in a different way so that the correct address is available to share the files.

— Reply to this email directly, view it on GitHubhttps://github.com/TheJacksonLaboratory/cs-nf-pipelines/issues/8#issuecomment-2276096473, or unsubscribehttps://github.com/notifications/unsubscribe-auth/BB62MS7EO5CHIDFTYN6X273ZQOELFAVCNFSM6AAAAABME4J7FSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENZWGA4TMNBXGM. You are receiving this because you were mentioned.Message ID: @.***>

peter-d-fields commented 1 month ago

The contents of the two file you're missing are actually relatively simple so I can include their contents here.

Mus_musculus.GRCm39.dna.primary_assembly.bed

1       1       195154279
10      1       130530862
11      1       121973369
12      1       120092757
13      1       120883175
14      1       125139656
15      1       104073951
16      1       98008968
17      1       95294699
18      1       90720763
19      1       61420004
2       1       181755017
3       1       159745316
4       1       156860686
5       1       151758149
6       1       149588044
7       1       144995196
8       1       130127694
9       1       124359700
MT      1       16299
X       1       169476592
Y       1       91455967

Mus_musculus.GRCm39.dna.primary_assembly.primaryChr.contig_list

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
X
Y
MT

You should be able to make a local copy and hopefully the pipeline will run for you.

lawtyunc commented 1 month ago

Thank you so much!! for future reference, How were you guys able to generate these files?


From: peter-d-fields @.> Sent: Thursday, August 8, 2024 1:41 PM To: TheJacksonLaboratory/cs-nf-pipelines @.> Cc: Laws, Tyler @.>; Mention @.> Subject: Re: [TheJacksonLaboratory/cs-nf-pipelines] Reference files for WGS (Issue #8)

The contents of the two file you're missing are actually relatively simple so I can include their contents here.

Mus_musculus.GRCm39.dna.primary_assembly.bed

1 1 195154279 10 1 130530862 11 1 121973369 12 1 120092757 13 1 120883175 14 1 125139656 15 1 104073951 16 1 98008968 17 1 95294699 18 1 90720763 19 1 61420004 2 1 181755017 3 1 159745316 4 1 156860686 5 1 151758149 6 1 149588044 7 1 144995196 8 1 130127694 9 1 124359700 MT 1 16299 X 1 169476592 Y 1 91455967

Mus_musculus.GRCm39.dna.primary_assembly.primaryChr.contig_list

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 X Y MT

You should be able to make a local copy and hopefully the pipeline will run for you.

— Reply to this email directly, view it on GitHubhttps://github.com/TheJacksonLaboratory/cs-nf-pipelines/issues/8#issuecomment-2276343913, or unsubscribehttps://github.com/notifications/unsubscribe-auth/BB62MS3VMYUTVRSDBZWNST3ZQOUT3AVCNFSM6AAAAABME4J7FSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENZWGM2DGOJRGM. You are receiving this because you were mentioned.Message ID: @.***>

peter-d-fields commented 1 month ago

@lawtyunc These particular files are not constructed by a script that I know of but are constructed by either directly inputing the values needed or by modifying, say, a .fai index to retain only those rows which are needed.

If the pipeline is running for you we can go ahead and close this issue.