Closed lawtyunc closed 2 weeks ago
@lawtyunc are you trying to run the WGS pipeline on mouse or human data?
mouse data
From: peter-d-fields @.> Sent: Wednesday, August 7, 2024 3:26 PM To: TheJacksonLaboratory/cs-nf-pipelines @.> Cc: Laws, Tyler @.>; Mention @.> Subject: Re: [TheJacksonLaboratory/cs-nf-pipelines] Reference files for WGS (Issue #8)
@lawtyunchttps://github.com/lawtyunc are you trying to run the WGS pipeline on mouse or human data?
— Reply to this email directly, view it on GitHubhttps://github.com/TheJacksonLaboratory/cs-nf-pipelines/issues/8#issuecomment-2274193811, or unsubscribehttps://github.com/notifications/unsubscribe-auth/BB62MS5ETBUKMMOBC4SDSJLZQJYE5AVCNFSM6AAAAABME4J7FSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENZUGE4TGOBRGE. You are receiving this because you were mentioned.Message ID: @.***>
@lawtyunc okay, if that's the case then you should be able to run the pipeline using the mouse GRCm39 reference builds by setting the gen_org
parameter to mouse
(this is the default) and the genome_build
parameter to GRCm39
.
Do I still need to go and retrieve this reference files?
From: peter-d-fields @.> Sent: Wednesday, August 7, 2024 4:13 PM To: TheJacksonLaboratory/cs-nf-pipelines @.> Cc: Laws, Tyler @.>; Mention @.> Subject: Re: [TheJacksonLaboratory/cs-nf-pipelines] Reference files for WGS (Issue #8)
@lawtyunchttps://github.com/lawtyunc okay, if that's the case then you should be able to run the pipeline using the mouse GRCm39 reference builds by setting the gen_org parameter to mouse (this is the default) and the genome_build parameter to GRCm39.
— Reply to this email directly, view it on GitHubhttps://github.com/TheJacksonLaboratory/cs-nf-pipelines/issues/8#issuecomment-2274267834, or unsubscribehttps://github.com/notifications/unsubscribe-auth/BB62MSZWASS6VDXASX7G4JDZQJ5VJAVCNFSM6AAAAABME4J7FSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENZUGI3DOOBTGQ. You are receiving this because you were mentioned.Message ID: @.***>
@lawtyunc You will need each of the reference files, yes. Are there particular files you do not have presently have access to?
Hey Peter, Mus_musculus.GRCm39.dna.primary_assembly.genome.bed Mus_musculus.GRCm39.dna.toplevel.primaryChr.contig_list
These are the files I have been struggling to find. Do you guys have these available? Is the contig list file called something else as default?
From: peter-d-fields @.> Sent: Thursday, August 8, 2024 9:52 AM To: TheJacksonLaboratory/cs-nf-pipelines @.> Cc: Laws, Tyler @.>; Mention @.> Subject: Re: [TheJacksonLaboratory/cs-nf-pipelines] Reference files for WGS (Issue #8)
@lawtyunchttps://github.com/lawtyunc You will need each of the reference files, yes. Are there particular files you do not have presently have access to?
— Reply to this email directly, view it on GitHubhttps://github.com/TheJacksonLaboratory/cs-nf-pipelines/issues/8#issuecomment-2275888860, or unsubscribehttps://github.com/notifications/unsubscribe-auth/BB62MS2G4FTBHR3FKFJ3AZDZQNZY3AVCNFSM6AAAAABME4J7FSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENZVHA4DQOBWGA. You are receiving this because you were mentioned.Message ID: @.***>
@lawtyunc We don't presently have a public repository to access these files but I can send them to you if you provide an email address.
@lawtyunc Two quick points. 1) You may have misidentified the files you need. I think you need (assuming your following our current pipeline specifications):
Mus_musculus.GRCm39.dna.primary_assembly.bed
Mus_musculus.GRCm39.dna.primary_assembly.primaryChr.contig_list
2) The manner in which you're interfacing with Github issues has listed the following as your email address: ***@***.***
. I assume this cannot be correct so you will need to reply in a different way so that the correct address is available to share the files.
@peter-d-fields Would you be able to send me an email address that I an contact you at ?
From: peter-d-fields @.> Sent: Thursday, August 8, 2024 11:22 AM To: TheJacksonLaboratory/cs-nf-pipelines @.> Cc: Laws, Tyler @.>; Mention @.> Subject: Re: [TheJacksonLaboratory/cs-nf-pipelines] Reference files for WGS (Issue #8)
@lawtyunchttps://github.com/lawtyunc Two quick points. 1) You may have misidentified the files you need. I think you need (assuming your following our current pipeline specifications):
Mus_musculus.GRCm39.dna.primary_assembly.bed Mus_musculus.GRCm39.dna.primary_assembly.primaryChr.contig_list
— Reply to this email directly, view it on GitHubhttps://github.com/TheJacksonLaboratory/cs-nf-pipelines/issues/8#issuecomment-2276096473, or unsubscribehttps://github.com/notifications/unsubscribe-auth/BB62MS7EO5CHIDFTYN6X273ZQOELFAVCNFSM6AAAAABME4J7FSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENZWGA4TMNBXGM. You are receiving this because you were mentioned.Message ID: @.***>
The contents of the two file you're missing are actually relatively simple so I can include their contents here.
Mus_musculus.GRCm39.dna.primary_assembly.bed
1 1 195154279
10 1 130530862
11 1 121973369
12 1 120092757
13 1 120883175
14 1 125139656
15 1 104073951
16 1 98008968
17 1 95294699
18 1 90720763
19 1 61420004
2 1 181755017
3 1 159745316
4 1 156860686
5 1 151758149
6 1 149588044
7 1 144995196
8 1 130127694
9 1 124359700
MT 1 16299
X 1 169476592
Y 1 91455967
Mus_musculus.GRCm39.dna.primary_assembly.primaryChr.contig_list
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
X
Y
MT
You should be able to make a local copy and hopefully the pipeline will run for you.
Thank you so much!! for future reference, How were you guys able to generate these files?
From: peter-d-fields @.> Sent: Thursday, August 8, 2024 1:41 PM To: TheJacksonLaboratory/cs-nf-pipelines @.> Cc: Laws, Tyler @.>; Mention @.> Subject: Re: [TheJacksonLaboratory/cs-nf-pipelines] Reference files for WGS (Issue #8)
The contents of the two file you're missing are actually relatively simple so I can include their contents here.
Mus_musculus.GRCm39.dna.primary_assembly.bed
1 1 195154279 10 1 130530862 11 1 121973369 12 1 120092757 13 1 120883175 14 1 125139656 15 1 104073951 16 1 98008968 17 1 95294699 18 1 90720763 19 1 61420004 2 1 181755017 3 1 159745316 4 1 156860686 5 1 151758149 6 1 149588044 7 1 144995196 8 1 130127694 9 1 124359700 MT 1 16299 X 1 169476592 Y 1 91455967
Mus_musculus.GRCm39.dna.primary_assembly.primaryChr.contig_list
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 X Y MT
You should be able to make a local copy and hopefully the pipeline will run for you.
— Reply to this email directly, view it on GitHubhttps://github.com/TheJacksonLaboratory/cs-nf-pipelines/issues/8#issuecomment-2276343913, or unsubscribehttps://github.com/notifications/unsubscribe-auth/BB62MS3VMYUTVRSDBZWNST3ZQOUT3AVCNFSM6AAAAABME4J7FSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENZWGM2DGOJRGM. You are receiving this because you were mentioned.Message ID: @.***>
@lawtyunc These particular files are not constructed by a script that I know of but are constructed by either directly inputing the values needed or by modifying, say, a .fai
index to retain only those rows which are needed.
If the pipeline is running for you we can go ahead and close this issue.
Hello Jackson lab. I am currently running your WGS pipeline as were are getting errors at the WGS:GATK_HAPLOTYPECALLER_INTERVAL step. I believe it is due to the format of our Homo_sapiens_assembly38.primary_chrom.bed accept I am doing this for Grcm39 for the --primary_chrom_bed param and this Mus_musculus.GRCm38.dna.toplevel.primaryChr.contig_list for the --chrom_contigs param. I was using this to try and create this files but not sure if this is the best solution to try and create these files but I was wondering if you guys had references of these files for mouse GRCm39?
https://www.biostars.org/p/486600/