biocore / deblur

Deblur is a greedy deconvolution algorithm based on known read error profiles.
BSD 3-Clause "New" or "Revised" License
91 stars 41 forks source link

Deblur outputs lowercase characters #148

Closed mortonjt closed 7 years ago

mortonjt commented 7 years ago

Here is an example of deblurred sequence

>AACggggggggCAAGTGTTCTTCGGAATGACTAGGCGTAAAGGGCACGTAGGCGGTTCATCGGGTTGAAAGTGAAAGTCGTCAAAAAGTG
AACggggggggCAAGTGTTCTTCGGAATGACTAGGCGTAAAGGGCACGTAGGCGGTTCATCGGGTTGAAAGTGAAAGTCGTCAAAAAGTG

Is there a reason why this gets outputted? This can cause problems downstream, since q2-feature-classifier cannot handle lowercase characters.

mortonjt commented 7 years ago

ping on this issue - this is a serious user issue, because it is a major technical barrier when trying to perform taxonomic classification. Otherwise the user will be forced to be running tr '[:lower:]' '[:upper:]'

At the very least, we need to document this.

ElDeveloper commented 7 years ago

I too have encountered this problem, however I assumed this was a sequencing artifact (low quality NTs). There's an issue for this in q2-types https://github.com/qiime2/q2-types/issues/91 That being said, if deblur is changing NTs to lowercase, what is the rationale for that?

Yoshiki Vázquez-Baeza

On May 30, 2017, at 6:22 PM, Jamie Morton notifications@github.com wrote:

ping on this issue - this is a serious user issue, because it is a major technical barrier when trying to perform taxonomic classification. Otherwise the user will be forced to be running tr '[:lower:]' '[:upper:]'

At the very least, we need to document this.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.

wasade commented 7 years ago

It's stemming from MAFFT iirc. I'd be :+1: for a PR to coerce to all upper case

On Wed, May 31, 2017 at 6:40 AM, Yoshiki Vázquez Baeza < notifications@github.com> wrote:

I too have encountered this problem, however I assumed this was a sequencing artifact (low quality NTs). There's an issue for this in q2-types https://github.com/qiime2/q2-types/issues/91 That being said, if deblur is changing NTs to lowercase, what is the rationale for that?

Yoshiki Vázquez-Baeza

On May 30, 2017, at 6:22 PM, Jamie Morton notifications@github.com wrote:

ping on this issue - this is a serious user issue, because it is a major technical barrier when trying to perform taxonomic classification. Otherwise the user will be forced to be running tr '[:lower:]' '[:upper:]'

At the very least, we need to document this.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/biocore/deblur/issues/148#issuecomment-305189582, or mute the thread https://github.com/notifications/unsubscribe-auth/AAc8svXiE0dhnFpptMGmPg00BC60sK-lks5r_W21gaJpZM4NCSce .

jakereps commented 7 years ago

Just stumbled across this through the q2-types issue. One thing QIIME 2 does (in q2_alignment.methods.mafft) is pass --preservecase to MAFFT, which may be an option for this as well by adding it to here.

wasade commented 7 years ago

Which q2types issue?

On Jun 15, 2017 12:48, "Jorden Kreps" notifications@github.com wrote:

Just stumbled across this through the q2-types issue. One thing QIIME 2 does (in q2_alignment.methods.mafft) is pass --preservecase to MAFFT, which may be an option for this as well by adding it to here https://github.com/biocore/deblur/blob/afba6a195bf6eb2c249bccffc3a842da64fdfaca/deblur/workflow.py#L519-L520 .

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/biocore/deblur/issues/148#issuecomment-308847329, or mute the thread https://github.com/notifications/unsubscribe-auth/AAc8smH03OCHImeaOYaSKpmbkKaGUSxgks5sEYqDgaJpZM4NCSce .

jakereps commented 7 years ago

The importing issue that Yoshiki mentioned, about QIIME 2 being unable to import lowercase sequences. I just ran into an issue with that by running MAFFT external to q2, and since he tagged it here this issue showed up on that page.

wasade commented 7 years ago

Ah, kk

On Jun 15, 2017 12:52, "Jorden Kreps" notifications@github.com wrote:

The importing issue that Yoshiki mentioned, about QIIME 2 being unable to import lowercase sequences. I just ran into an issue with that by running MAFFT external to q2, and since he tagged it here this issue showed up on that page.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/biocore/deblur/issues/148#issuecomment-308848238, or mute the thread https://github.com/notifications/unsubscribe-auth/AAc8sliAMGnBID96x44WW2VjMKN4Tpudks5sEYuAgaJpZM4NCSce .

ElDeveloper commented 7 years ago

Any way this can be included in a release soon?