Do sequence heads have to match?

DerKevinRiehl / transposition_detector_deTEct

Transposition event detection tool using NGS alignment data and SV calling outputs (VCF files) from PBSV or Sniffles

GNU General Public License v3.0

7 stars 0 forks source link

Do sequence heads have to match? #3

Open vkeggers opened 4 months ago

vkeggers commented 4 months ago

I'm comparing two different nematode genomes and they are a little fragmented (the main 6 chromosomes + a few extra contigs) and each have different sequence names and number of contigs. Do the sequence names of all the files have to match?

Traceback (most recent call last): File "/home/veggers/.conda/envs/transposition_detector_detect/share/TranspositionEventDetector_deTEct/TranspositionDetector.py", line 81, in parseSniffles_SVs(seqHeadFile, svFile, ouFile1) File "/home/veggers/.conda/envs/transposition_detector_detect/share/TranspositionEventDetector_deTEct/ParserSniffles.py", line 46, in parseSniffles_SVs fW.write(sequenceDictB[chrom]+"\t"+"SVIM"+"\t"+"insertion"+"\t"+start+"\t"+end+"\t"+"."+"\t"+"+"+"\t"+"."+"\t"+info)


KeyError: 'CM021144.1_356'

I guess I could just extract the main chromosomes from all the files and standardize the names if this is the case.

DerKevinRiehl commented 4 months ago

Dear Viktoria, thanks for for your interest in our work.

Yes you are right, the names shall be standardized. The best would be something like "Sequence_1", as other softwares like transposon reasonate also do that when processing files, they standardize and rename the sequences of fasta files.

Please let me know if this worked for you, Best, Kevin

vkeggers commented 3 months ago

Right, my problem is just that one of the assemblies isn't chromosome scale. I know in my first post it was just 2 genomes, but that was to keep the question simple. I actually have like 10ish remanei/latens/briggsae species. Most of these are chr scale but one is pretty fragmented.

I turned it around and annotated TEs in the reference, got a vcf from the reference and query alignment, and then all the chr names match automatically bc the reference was used for both. Previously I was annotating TEs in the query, which if not chr scale won't align with the vcf.

Anyways, I'm not sure if one way is particularly right, but I got a similar pattern using either method. The only difference was that more events were found when using TEs annotated from the reference. But like I said, the pattern is the same and similar to your paper

Thanks Kevin

DerKevinRiehl commented 3 months ago

Great to hear :-) If the problem is solved, can we close this issue?