gaurav / taxondna

Taxonomy-aware DNA sequence processing toolkit
http://www.ggvaidya.com/taxondna/
GNU General Public License v2.0
31 stars 10 forks source link

Most of my sequences are not added #60

Open LLVJ opened 8 years ago

LLVJ commented 8 years ago

Hello, I want to add sequence data for 370 plant species using matK and rbcL genes, but it only adds 5 species. The window says "Some sequences in the taxonset CES_matK norm nimed weren't added. These are: Gymnocarpium dryopteris: It is too short (945 bp, while the column is supposed to be 1206 bp) Matteuccia struthiopteris: It is too short (762 bp, while the column is supposed to be 1206 bp) Dryopteris filix-mas: It is too short (615 bp, while the column is supposed to be 1206 bp) Athyrium filix-femina: It is too short (999 bp, while the column is supposed to be 1206 bp) Dryopteris carthusiana: It is too short (963 bp, while the column is supposed to be 1206 bp) Asplenium trichomanes: It is too short (978 bp, while the column is supposed to be 1206 bp) Cystopteris sudetica: It is too short (930 bp, while the column is supposed to be 1206 bp) Thelypteris palustris: It is too short (834 bp, while the column is supposed to be 1206 bp) Botrychium lunaria: It is too long (1503 bp, while the column is supposed to be 1206 bp) Ophioglossum vulgatum: It is too short (777 bp, while the column is supposed to be 1206 bp) Equisetum arvense: It is too long (1431 bp, while the column is supposed to be 1206 bp) etc"

What should I do? Thank you!

gaurav commented 8 years ago

Hey there! Sequence Matrix expects aligned sequences, and so won't let you import unaligned sequences, since it doesn't know how to pad them. Why are you trying to import unaligned sequences? One quick fix would be to align them using a quick-and-dirty aligner, but I'm not sure if that's what you're trying to do.

jpiaskowski commented 7 years ago

I'm having the same issues - with aligned sequences. How is sequence matrix deciding the proper length of a sequence? When I check the sequence length in bash, they all look the same. Thanks!

gaurav commented 7 years ago

@jpiaskowski Strange! It should be reading the length of each sequence separately. Could you please e-mail me a file the SequenceMatrix isn't opening? My e-mail address is gaurav[at]ggvaidya[dot]com.