gaurav / taxondna

Taxonomy-aware DNA sequence processing toolkit
http://www.ggvaidya.com/taxondna/
GNU General Public License v2.0
31 stars 10 forks source link

Problems with importing multiple files #98

Open NicolasLouw opened 2 years ago

NicolasLouw commented 2 years ago

Good day,

Sorry for already posting another issue, but I went through all the issues and I could not find another posted issue that is similar enough to mine. Right now, I am trying to combine multiple separate multiple sequence alignment files and organise it according to my one large combined multiple sequence alignment file to create a character set that I want to use an input for a maximum likelihood based tree in IQ-tree. I used MAFFT as an extension in Orthofinder to obtain my multiple sequence alignments between 8 different species. As a result, I have 10210 separate multiple sequence alignment files. Within each of those files, the headers in my fasta files between the species were problematic, because it had unique headers for the different protein names. I standardised the names of the headers in all the fasta files using sed. So now I only have 8 unique headers, representing my 8 different species in the multiple sequence alignment fasta files. Using those files, I am able to upload them into SequenceMatrix, but I do have one issue, when I drag and drop all the files, I get a warning message: "Some sequences in the taxonset OG0000003 weren't added. These are: Penicillium-brevicompactum: Multiple sequences with the same name found, only the largest one is being used"

If I click okay it successfully uploads some of the sequences. However, this warning message comes up for all of my separate files and there are over ten thousand. Is there a way that I can import my files by bypassing this warning message?

Thank you so much in advance!

Best, Nicolas