Closed durwa004 closed 4 years ago
There were some pieces of code that were commented out on accident. I reverted that changes to what they should be. Can you try pulling the changes from GitHub or re-downloading the scripts and trying again?
It seems to be working now! Thank you
On Thu, Jan 16, 2020 at 4:37 PM Rob Schaefer notifications@github.com wrote:
There were some pieces of code that were commented out on accident. I reverted that changes to what they should be. Can you try pulling the changes from GitHub or re-downloading the scripts and trying again?
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/UMN-EGGL/MNEc2to3/issues/4?email_source=notifications&email_token=ADY2HS2AQ7BOOYSQIUCWR5LQ6DOS5A5CNFSM4KHNLJ52YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEJFZQ5Q#issuecomment-575379574, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADY2HS7KTU2JBPKZFUYAUJ3Q6DOS5ANCNFSM4KHNLJ5Q .
-- Sian Durward-Akhurst, BVMS, MS, Diplomate ACVIM, MRCVS PhD Candidate in Equine Genetics
You know where to go once it breaks :D
So true!!
Sadly I think it has. I tried running it on my laptop to avoid the MSI limitations and it must have run for ~1.5 hours and then I get this error:
Traceback (most recent call last):
File "../MNEc2to3/scripts/01_make_probeseq.py", line 48, in <module>
print_flank_sequence(args.vcf,args.fasta,args.out)
File "../MNEc2to3/scripts/01_make_probeseq.py", line 20, in
print_flank_sequence
if fasta[var.chrom][var.pos] != var.ref:
File
"/Users/durwa004/Library/Python/3.7/lib/python/site-packages/locuspocus/fasta.py",
line 223, in __getitem__
raise ValueError(f'{chrom_name} not in {self.m80.name}')
ValueError: chr1 not in temp```
My computer was struggling to run it and so I reinstalled the environment I
created on msi and everything seemed to install successfully but I get this
error:
/home/mccuem/durwa004/.conda/envs/lp_sda/bin/python3 /home/mccuem/shared/Projects/HorseGenomeProject/scripts/EquCab3/MNEc2to3/scripts/01_make_probeseq.py --vcf GCA_000002305.1.chrs.vcf.gz --fasta GCA_000002305.1_EquCab2.0_genomic.fna.gz --out dbsnp_probes.fa
Traceback (most recent call last):
File
"/home/mccuem/shared/Projects/HorseGenomeProject/scripts/EquCab3/MNEc2to3/scripts/01_make_probeseq.py",
line 48, in
print_flank_sequence(args.vcf,args.fasta,args.out)
File "/home/mccuem/shared/Projects/HorseGenomeProject/scripts/EquCab3/MNEc2to3/scripts/01_make_probeseq.py", line 20, in print_flank_sequence
if fasta[var.chrom][var.pos] != var.ref:
File "/home/mccuem/durwa004/.local/lib/python3.7/site-packages/locuspocus/fasta.py", line 223, in getitem
raise ValueError(f'{chrom_name} not in {self.m80.name}')
ValueError: CM000377.2 not in temp
This is with the right version of the EquCab2 genome (downloaded from NCBI)
and the chromosome names are the same between the vcf and fasta as far as I
can tell.
Sorry to be a pain.
On Fri, Jan 17, 2020 at 7:54 AM Rob Schaefer <notifications@github.com>
wrote:
> You know where to go once it breaks :D
>
> —
> You are receiving this because you authored the thread.
> Reply to this email directly, view it on GitHub
> <https://github.com/UMN-EGGL/MNEc2to3/issues/4?email_source=notifications&email_token=ADY2HS3UIF3TB7JEMETUI23Q6G2B7A5CNFSM4KHNLJ52YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEJHX34Q#issuecomment-575634930>,
> or unsubscribe
> <https://github.com/notifications/unsubscribe-auth/ADY2HS5U4MT5KB6QU6QYF2LQ6G2B7ANCNFSM4KHNLJ5Q>
> .
>
--
Sian Durward-Akhurst, BVMS, MS, Diplomate ACVIM, MRCVS
PhD Candidate in Equine Genetics
University of Minnesota Equine Genetics and Genomics Laboratory
225 Veterinary Population Medicine
1365 Gortner Avenue
Saint Paul, MN
55108
durwa004@umn.edu
Pronouns: she/her/hers
http://d.umn.edu/sexuality-gender-equity-initiatives/education-advocacy/pronouns
There appears to be a SNP in your VCF assigned to chromosome CM000377.2
which isn't in the reference genome. Here are all the chromosome names I could find on MSI:
[/home/mccuem/shared/FASTAs] % cat Equus_cab_nucl_wChrUn1_2.fasta | grep ">"
>chr10
>chr11
>chr12
>chr13
>chr14
>chr15
>chr16
>chr17
>chr18
>chr19
>chr1
>chr20
>chr21
>chr22
>chr23
>chr24
>chr25
>chr26
>chr27
>chr28
>chr29
>chr2
>chr30
>chr31
>chr3
>chr4
>chr5
>chr6
>chr7
>chr8
>chr9
>chrX
>chrUn
>chrUn2
So either we need to add those chromosomes to the fasta or remove them from the VCF file. Otherwise, we can add a chunk of code to catch those errors and skip those SNPs (which I think would be easiest). Thoughts?
That makes sense, but I am not sure that is it. I pulled the same reference as the one that the dbsnp SNPs are from (GCA_000002305.1) and made sure the chromosome names matched
This is the top of the dbsnp vcf file:
##fileformat=VCFv4.2
##FILTER=<ID=PASS,Description="All filters passed">
##INFO=<ID=ALMM,Number=0,Type=Flag,Description="Alleles mismatch flag,
present when some of the submitted variants have inconsistent allele
information. See https://github.com/EBI
##INFO=<ID=ASMM,Number=0,Type=Flag,Description="Assembly mismatch flag,
present when the reference allele doesn't match the reference sequence">
##INFO=<ID=LOE,Number=0,Type=Flag,Description="Lack of evidence flag,
present if no submitted variant includes genotype or frequency information">
##INFO=<ID=RS_VALIDATED,Number=0,Type=Flag,Description="RS validated flag,
present when the RS was validated by any method as indicated by the dbSNP
validation status">
##INFO=<ID=SID,Number=.,Type=String,Description="Identifiers of studies
that report a variant">
##INFO=<ID=SS_VALIDATED,Number=1,Type=Integer,Description="Number of
submitted variants clustered in an RS that were validated by any method as
indicated by the dbSNP validation s
##INFO=<ID=VC,Number=1,Type=String,Description="Variant class according to
the Sequence Ontology">
##contig=<ID=CM000386.2,accession="CM000386.2">
##contig=<ID=CM000387.2,accession="CM000387.2">
##contig=<ID=CM000388.2,accession="CM000388.2">
##contig=<ID=CM000389.2,accession="CM000389.2">
##contig=<ID=CM000390.2,accession="CM000390.2">
##contig=<ID=CM000391.2,accession="CM000391.2">
##contig=<ID=CM000392.2,accession="CM000392.2">
##contig=<ID=CM000393.2,accession="CM000393.2">
##contig=<ID=CM000394.2,accession="CM000394.2">
##contig=<ID=CM000395.2,accession="CM000395.2">
##contig=<ID=CM000377.2,accession="CM000377.2">
##contig=<ID=CM000396.2,accession="CM000396.2">
##contig=<ID=CM000397.2,accession="CM000397.2">
##contig=<ID=CM000398.2,accession="CM000398.2">
##contig=<ID=CM000399.2,accession="CM000399.2">
##contig=<ID=CM000400.2,accession="CM000400.2">
##contig=<ID=CM000401.2,accession="CM000401.2">
##contig=<ID=CM000402.2,accession="CM000402.2">
##contig=<ID=CM000403.2,accession="CM000403.2">
##contig=<ID=CM000404.2,accession="CM000404.2">
##contig=<ID=CM000405.2,accession="CM000405.2">
##contig=<ID=CM000378.2,accession="CM000378.2">
##contig=<ID=CM000406.2,accession="CM000406.2">
##contig=<ID=CM000407.2,accession="CM000407.2">
##contig=<ID=CM000379.2,accession="CM000379.2">
##contig=<ID=CM000380.2,accession="CM000380.2">
##contig=<ID=CM000381.2,accession="CM000381.2">
##contig=<ID=CM000382.2,accession="CM000382.2">
##contig=<ID=CM000383.2,accession="CM000383.2">
##contig=<ID=CM000384.2,accession="CM000384.2">
##contig=<ID=CM000385.2,accession="CM000385.2">
##contig=<ID=NC_001640.1,accession="X79547.1">
##contig=<ID=CM000408.2,accession="CM000408.2">
##reference=<ID=EquCab2.0,accession=GCA_000002305.1,URL=
https://www.ebi.ac.uk/ena/data/view/GCA_000002305.1>
##bcftools_annotateVersion=1.2+htslib-1.2.1
##bcftools_annotateCommand=annotate --rename-chrs
rename_dbsnp_chrs_with_chr.txt GCA_000002305.1_current_ids.vcf.gz
#CHROM POS ID REF ALT QUAL FILTER INFO
CM000377.2 425 rs396553607 A G . .
LOE;SID=DINDOTLAB_QH_SNPS;SS_VALIDATED=0;VC=SO:0001483
CM000377.2 551 rs394481920 T G . .
LOE;SID=EGGL_MNEC_23M,DINDOTLAB_QH_SNPS;SS_VALIDATED=0;VC=SO:0001483
CM000377.2 561 rs395435164 G C . .
LOE;SID=EGGL_MNEC_23M,DINDOTLAB_QH_SNPS;SS_VALIDATED=0;VC=SO:0001483
These are the fasta file chromosome names
cat GCA_000002305.1_EquCab2.0_genomic_renamed.fna | grep ">C"
>CM000377.2
>CM000378.2
>CM000379.2
>CM000380.2
>CM000381.2
>CM000382.2
>CM000383.2
>CM000384.2
>CM000385.2
>CM000386.2
>CM000387.2
>CM000388.2
>CM000389.2
>CM000390.2
>CM000391.2
>CM000392.2
>CM000393.2
>CM000394.2
>CM000395.2
>CM000396.2
>CM000397.2
>CM000398.2
>CM000399.2
>CM000400.2
>CM000401.2
>CM000402.2
>CM000403.2
>CM000404.2
>CM000405.2
>CM000406.2
>CM000407.2
>CM000408.2
On Fri, Jan 17, 2020 at 12:15 PM Rob Schaefer notifications@github.com wrote:
There appears to be a SNP in your VCF assigned to chromosome CM000377.2 which isn't in the reference genome. Here are all the chromosome names I could find on MSI:
[/home/mccuem/shared/FASTAs] % cat Equus_cab_nucl_wChrUn1_2.fasta | grep ">"
chr10 chr11 chr12 chr13 chr14 chr15 chr16 chr17 chr18 chr19 chr1 chr20 chr21 chr22 chr23 chr24 chr25 chr26 chr27 chr28 chr29 chr2 chr30 chr31 chr3 chr4 chr5 chr6 chr7 chr8 chr9 chrX chrUn chrUn2
So either we need to add those chromosomes to the fasta or remove them from the VCF file. Otherwise, we can add a chunk of code to catch those errors and skip those SNPs (which I think would be easiest). Thoughts?
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/UMN-EGGL/MNEc2to3/issues/4?email_source=notifications&email_token=ADY2HS7X2C2OSYAXITT6NGDQ6HYVRA5CNFSM4KHNLJ52YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEJIQ2IQ#issuecomment-575737122, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADY2HSZAJVY46FZVEYYPMILQ6HYVRANCNFSM4KHNLJ5Q .
-- Sian Durward-Akhurst, BVMS, MS, Diplomate ACVIM, MRCVS PhD Candidate in Equine Genetics
University of Minnesota Equine Genetics and Genomics Laboratory 225 Veterinary Population Medicine 1365 Gortner Avenue Saint Paul, MN 55108
durwa004@umn.edu
Pronouns: she/her/hers http://d.umn.edu/sexuality-gender-equity-initiatives/education-advocacy/pronouns
I have successfully installed the requirements to run this, so that I can try and remap the dbsnp loci. However when I run this command:
I get this error:
I'm not sure where I'm going wrong, I have renamed the chromosomes in the dbsnp file so that they match the fasta and even tried sorting the vcf to be in the same order as the fasta and am still getting the same error. Not sure where I am going wrong!