clairemerot / genotyping_SV

Genotype SV in multiple samples sequenced with short-reads, using a catalog of SV
2 stars 1 forks source link

Can't find the python script "01_scripts/fasta_extract_flankingregions_claire.py" #1

Open JAKO-waccbip opened 1 year ago

JAKO-waccbip commented 1 year ago

Hi Claire, Please, while running the genotyping_SV pipeline, I could not find the script "01_scripts/fasta_extract_flankingregions_claire.py" in the command "python3 ../01_scripts/fasta_extract_flankingregion_claire.py Genome.fasta all_samples_DP1_MISS50_2all.chrpos 1 all_samples_DP1_MISS50_2all.variants.ref" in the (https://github.com/clairemerot/genotyping_SV/blob/main/01_scripts/07c_filter_pop_vcf.sh).

these are my files heads OUTPUT OF (less "$VCF_BASE"_MISS50_2all.chrpos) AgamP4_2L 8969 AgamP4_2L 26006 AgamP4_2L 57396 AgamP4_2L 57397 AgamP4_2L 76129 AgamP4_2L 76902

Output of (less genome.fasta)AgamP4_2L | organism=Anopheles_gambiae_PEST | version=AgamP4 | length=49364325 | SO=chromosomeAACCATGGTCCAGAGTACACATTGACTATGCAGGCCTAGTAGACGAATTCTACTTCCTTG TAATCGTGGATCCACACTCGAAATGGCCGGAAGTTTACGCTACCAGATCAATAACTGCGA GAACAACAATAAGAATTTTGAAACAAATTTTCGCAACTTTCGGAGTGCCAGAAGTTCTCG TGTCTGATAACGGTACTCAATTTACCAGTTACGAGTTTAAGGAGTTTTGCGTTAGTCAAG GCATCCAACACTTGCGCATTGCTCCATATCATCCGCAATCCAACGGGTTAGCTGAACGAT TTGTGGATACACTGAAACGAAGTATTCAAAAAATTCGCAAGGGAGGGAATCTCTCGAAGA TGCACTAACCACTTTCCTTCAAGTATATCGAACCACATCATCTGGAGATTTGGATGGAAA AGCTCCTGCTGACATTATGTTCTCTAGACCATTACGAACTATTTCGTCGTTCCTCAAACC AAGCGAGCACGGAAATGTTGAGCCGAGGAACAGAATGAAGGAAGCCGAATTTTTCAACAA GAAGCACGGGGCAGTGAAATGATGTTATCAACAGGGCGATGCTGTTTATGTCAAGATATA

Please, what might be wrong with the run?

Another thing is that I am using pooled DNA samples for this analysis. I don t known if it is appropriate to do that.

I am looking forward to hearing from you soon

Jonas

clairemerot commented 1 year ago

Dear Jonas, Thanks for reporting the missing script. I have now added it to the repository.

That being said, I'm not sure what you are trying to do. While this set of scripts is meant to be useful for other research they are not directly translatable to other datasets (and I'm afraid not well documented enough).

Please note that the script you are using (07c) is meant to build a kind of dummy beagle woth correct GL, chr, start as position but putting just one base (extracted from the reference with the python script) in REF/ALT. The goal was simply to make it able to fit into pcangsd to perform a PCA on genotype likelihood for SVs.

What do you mean by pooled DNA samples?

Good luck with your research, and feel free to get in touch if I can help you using the scripts Claire