Open vkeggers opened 4 months ago
Dear Viktoria, thanks for for your interest in our work.
Yes you are right, the names shall be standardized. The best would be something like "Sequence_1", as other softwares like transposon reasonate also do that when processing files, they standardize and rename the sequences of fasta files.
Please let me know if this worked for you, Best, Kevin
Right, my problem is just that one of the assemblies isn't chromosome scale. I know in my first post it was just 2 genomes, but that was to keep the question simple. I actually have like 10ish remanei/latens/briggsae species. Most of these are chr scale but one is pretty fragmented.
I turned it around and annotated TEs in the reference, got a vcf from the reference and query alignment, and then all the chr names match automatically bc the reference was used for both. Previously I was annotating TEs in the query, which if not chr scale won't align with the vcf.
Anyways, I'm not sure if one way is particularly right, but I got a similar pattern using either method. The only difference was that more events were found when using TEs annotated from the reference. But like I said, the pattern is the same and similar to your paper
Thanks Kevin
Great to hear :-) If the problem is solved, can we close this issue?
I'm comparing two different nematode genomes and they are a little fragmented (the main 6 chromosomes + a few extra contigs) and each have different sequence names and number of contigs. Do the sequence names of all the files have to match?
Traceback (most recent call last): File "/home/veggers/.conda/envs/transposition_detector_detect/share/TranspositionEventDetector_deTEct/TranspositionDetector.py", line 81, in
parseSniffles_SVs(seqHeadFile, svFile, ouFile1)
File "/home/veggers/.conda/envs/transposition_detector_detect/share/TranspositionEventDetector_deTEct/ParserSniffles.py", line 46, in parseSniffles_SVs
fW.write(sequenceDictB[chrom]+"\t"+"SVIM"+"\t"+"insertion"+"\t"+start+"\t"+end+"\t"+"."+"\t"+"+"+"\t"+"."+"\t"+info)