bioturing / hera

MIT License
71 stars 9 forks source link

missing output column? #8

Closed colindaven closed 7 years ago

colindaven commented 7 years ago

Hi,

so hera ran nicely using the downloaded GRCh37 reference and produced decent looking output - thanks.

I may have found an omission:

In the abundance.gene.tsv it seems the length column is present in the header but missing in the file. Am I right ?

#gene_id gene_name length est_counts tpm ENSG00000239196 AL591856.3 0.000000 0.000000 ENSG00000266603 AL591856.7 0.000000 0.000000 ENSG00000263662 AL591856.4 0.000000 0.000000 ENSG00000220023 AL592183.1 0.000000 0.000000

VERSION: hera-v1.0

micknudsen commented 7 years ago

Same thing here. Furthermore, all values in the last column are nan or -nan in my output.

#gene_id        gene_name       length  est_counts      tpm
1:ENSG00000210049.1     .1MT-TF 0.000000        nan
2:ENSG00000211459.2     .2MT-RNR1       10778.000000    nan
1:ENSG00000210077.1     .1MT-TV 0.000000        nan
2:ENSG00000210082.2     .2MT-RNR2       35246.000000    nan
1:ENSG00000209082.1     .1MT-TL1        0.000000        nan
2:ENSG00000198888.2     .2MT-ND1        10612.000000    nan
kspham commented 7 years ago

Colin, Thank you --- that's right! We will remove the length column in the header!

kspham commented 7 years ago

Michael, Thank you very much for your bug report and sorry for the inconvenience. We have released a new version 1.0.1 to fix the bug, and adding some minor features that users have requested. Please get the new version and let us know the bug has gone for your case.

kspham commented 7 years ago

Colin and Michael, please let me know if your issues have been fixed for you?

micknudsen commented 7 years ago

@kspham I am running a test now, but for some reason it is really, really slow (been running for +12 hours now). I will let you know, when I have results.

kspham commented 7 years ago

Michael, this is very strange, usually, it should finish within a couple of minutes, is there a way for us to reproduce the scenario to see what's going? -Son

micknudsen commented 7 years ago

@kspham My first run (~50 million 2x75bp reads) used 13 CPU days in computation. It maxed out on all 24 cores most of the time. I have found some publicly available data. Will try running on that and see how long time it takes. Stay tuned...

bioturing commented 7 years ago

Also, can you screenshot for me hera's output on the terminal? It may help us know at which step it got stuck.

On Fri, Aug 4, 2017 at 9:54 AM, Michael Knudsen notifications@github.com wrote:

@kspham https://github.com/kspham My first run (~50 million 2x75bp reads) used 13 CPU days in computation. It maxed out on all 24 cores most of the time. I have found some publicly available data. Will try running on that and see how long time it takes. Stay tuned...

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/bioturing/hera/issues/8#issuecomment-320299067, or mute the thread https://github.com/notifications/unsubscribe-auth/AdEcJC-wSJ27U5lJBxqkTEOk4hzp2no2ks5sU0yygaJpZM4OsDbf .

micknudsen commented 7 years ago

This is a typical scenario:

(hera_test) [michaelk@s03n18 hera_test]$ hera quant -i reference/ -t 16 -o output/ ../merged_fastq/Q30M00029R_RNA_R1.fastq.gz ../merged_fastq/Q30M00029R_RNA_R2.fastq.gz
Hera is a program developed by BioTuring for RNA-Seq analysis.
Please contact info@bioturing.com if you need further support
Number of processed pairs   : 100000

No progress for about 10 minutes, and CPU usage is 1600%.

bioturing commented 7 years ago

Very strange -- please help us reproduce the bug with some public data!! Thank you!

On Fri, Aug 4, 2017 at 10:34 AM, Michael Knudsen notifications@github.com wrote:

This is a typical scenario:

(hera_test) [michaelk@s03n18 hera_test]$ hera quant -i reference/ -t 16 -o output/ ../merged_fastq/Q30M00029R_RNA_R1.fastq.gz ../merged_fastq/Q30M00029R_RNA_R2.fastq.gz Hera is a program developed by BioTuring for RNA-Seq analysis. Please contact info@bioturing.com if you need further support Number of processed pairs : 100000

No progress for about 10 minutes, and CPU usage is 1600%.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/bioturing/hera/issues/8#issuecomment-320308441, or mute the thread https://github.com/notifications/unsubscribe-auth/AdEcJLCrW-O1EdQN8IFmOU5U1jq6xQXaks5sU1YQgaJpZM4OsDbf .

micknudsen commented 7 years ago

I will have very little time to do troubleshooting in the next few weeks, but I promise to get back to you after that.

colindaven commented 7 years ago

Hi, afraid I can't really test the output on v1.0.1, as it doesn't produce any output at all (repeatedly attempted with same data as before and below).

I downgraded -using conda - to 1.0.0 and can get output as before. Cannot share test case, as this is cancer data - sorry.

Best, Colin

colindaven commented 7 years ago

Because I have scripted the command, I am not seeing on screen output. This might be helpful. It might also be more helpful to catch and dump errors to a progress/log file for the output folder.

Searching further, the error is a segfault on v1.0.1

hera quant -i /lager2/rcug/seqres/HS/hera/GRCh37 -t 16 -z 7 -f /lager2/rcug/seqres/HS/hg19.fa -o 22839_S3_L001_R1.hera 22839_S3_L001_R1.fastq 22839_S3_L001_R2.fastq Hera is a program developed by BioTuring for RNA-Seq analysis. Please contact info@bioturing.com if you need further support Segmentation fault (core dumped)

GinnyAquarius commented 7 years ago

hi colindaven, Please try again with '/' after -o 22839_S3_L001_R1.hera (-o 22839_S3_L001_R1.hera/) and tell us if it can work. The error appeared because hera can not open file for writing. This error should be fixed in this version as we expected, but it seem to be not. Thank you for informing us. Best, Bioturing Algorithm Team.

colindaven commented 7 years ago

Ok, the "/" after the -o solved the issue on 1.0.1. Thanks for keeping this on conda, it saves a lot of people time and simplifies environment issues. Colin