galaxyproject / usegalaxy-playbook

Ansible Playbook for usegalaxy.org
Academic Free License v3.0
30 stars 25 forks source link

Htseq_count can fail with RNA STAR input #84

Open jennaj opened 6 years ago

jennaj commented 6 years ago

Tracking ticket Once the problem is resolved and Main updated (as needed) we can close this out.

Workaround Use HISAT2 instead of RNA STAR.

Example error

Fatal error: Unknown error occured
[bam_sort_core] merging from 32 files...
100000 GFF lines processed.
200000 GFF lines processed.
300000 GFF lines processed.
400000 GFF lines processed.
500000 GFF lines processed.
600000 GFF lines processed.
700000 GFF lines processed.
741207 GFF lines processed.
Error occured when processing SAM input (record #894 in file name_sorted_alignment.bam):
  unsigned byte integer is less than minimum
  [Exception type: OverflowError, raised in csamtools.pyx:2308]

Potentially the root issue: https://www.biostars.org/p/147487/

Comments from @natefoo:

I believe the message is coming from the version of pysam in use by htseq (not samtools as used by the tool or pysam in the Galaxy framework). But it looks like we are using the latest htseq dependency supported by the IUC tool, 0.6.1.post1 (even though we're still using the tool from Lance's repo):

https://github.com/galaxyproject/tools-iuc/blob/6f82cbc16053cecdf58d15a8d0fcdeac7991abaf/tools/htseq_count/htseq-count.xml#L4

I'd pass this on to the IUC to see if they have any ideas.

natefoo commented 6 years ago

@davebx is updating to htseq-count 0.9.1, which hopefully fixes it (or at least it's worth testing once it's updated).

jennaj commented 6 years ago

Test history: https://usegalaxy.org/u/jen/h/test-history-rnastar (includes updated star 2.5.2b-0 + htseq 0.9.1)

I don't think this exact test will capture the specific error above -- and we don't have an example of the inputs that trigger this (why there was no original test history, end-user deleted before I could get it back in Jan) -- but we can watch for it being reported again.

I close this out once the general-usage tests finish overnight.

jennaj commented 6 years ago

I can't get HTseq to work (no features overlap - even when using HISAT2 input). Featurecounts works with the same inputs. I am using tutorial data that should work with both.

Second test history: https://usegalaxy.org/u/jen/h/test-history-cufflinks-hisat2

I'll need to troubleshoot this more.

jennaj commented 6 years ago

Trying again with the newer version of HISAT2, STAR, and complete reruns with different params. In progress, same test history as in the prior comment.

jennaj commented 6 years ago

Still a problem. Looks like the tool was updated in the MTS (bug fix) but didn't have a revision change. Could we update it to the most current MTS version and see if this problem goes away, too?

https://github.com/galaxyproject/usegalaxy-playbook/issues/124

jennaj commented 6 years ago

Retesting with different data - think works in some data, might be a corner-case tool bug