bcbio / bcbio-nextgen

Validated, scalable, community developed variant calling, RNA-seq and small RNA analysis
https://bcbio-nextgen.readthedocs.io
MIT License
994 stars 353 forks source link

In Small RNAseq analysis, script path issue #1351

Closed hmkim closed 8 years ago

hmkim commented 8 years ago

I'm testing the example pipeline in bcbio (small rna seq)

When I checking the data in pipeline, I saw the error log.

mirqc_bcbio/work/mirdeep2$ less error_res.log

RNAfold: invalid option -- n sh: 1: perform_controls.pl: not found RNAfold: invalid option -- n sh: 1: perform_controls.pl: not found RNAfold: invalid option -- n sh: 1: perform_controls.pl: not found

I think this issue caused by script path

Could you check this issue ?

lpantano commented 8 years ago

Hi,

thanks for checking this. I reproduced this, and even with this errors you should get results in the mirdeep2 folder, can you confirm this? These errors are related to some controls calculated in the pipeline that shouldn't affect the final output.

I am updating the tool in bioconda, so you should be good updating bcbio tools. Probably you will need to wait a couple of hours to get the package updated in the conda server.

Let me know if you have further error with this in the future.

thanks for checking!

AurelieMLB commented 8 years ago

Hello,

Apologies for the delay. I hope it is not too late. I have doubled check. We do have the mirdeep2 folder in work with all those files and directories:

ls -R * align.bam error_res.log file_reads.fa miRNAs_expressed_all_samples_res.csv result_res.bed

dir_prepare_signature: mature_vs_precursors.arf precursors.ebwt.2.ebwt precursors.ebwt.rev.1.ebwt reads_vs_precursors.arf signature_unsorted.arf.tmp mature_vs_precursors.bwt precursors.ebwt.3.ebwt precursors.ebwt.rev.2.ebwt reads_vs_precursors.bwt signature_unsorted.arf.tmp2 precursors.ebwt.1.ebwt precursors.ebwt.4.ebwt precursors.fa signature_unsorted.arf

expression_analyses: expression_analyses_res

expression_analyses/expression_analyses_res: bowtie_mature.out file_reads.fa_mapped.bwt mature.fa_mapped.bwt miRNA_precursor.1.ebwt miRNA_precursor.rev.1.ebwt rna.ps bowtie_reads.out mature2hairpin miRBase.mrd miRNA_precursor.2.ebwt miRNA_precursor.rev.2.ebwt file_reads.fa.converted mature.converted miRNA_expressed.csv miRNA_precursor.3.ebwt precursor.converted file_reads.fa_mapped.arf mature.fa_mapped.arf miRNA_not_expressed.csv miRNA_precursor.4.ebwt read_occ

mirdeep_runs: run_res

mirdeep_runs/run_res: output.mrd run_res_parameters survey.csv tmp

mirdeep_runs/run_res/tmp: align.bam_parsed.arf command_line mature.fa precursors.coords_all precursors.fa_stack binaries file_reads.fa output_permuted.mrd precursors.fa precursors.str binaries2 hairpin.fa precursors.coords precursors.fa_all signature.arf

But from the documentation, I was expecting a file called counts_mirna_novel.tsv and I cannot find it in the work or in the final folder. Is it normal please?

Thanks a lot!

pierduemila commented 8 years ago

Hello, I am also testing the small RNA pipeline with bcbio and I got stacked somewhere here:

Below the last lines of the log file where the run stopped:

INFO locus bigger > 500 nt, skipping: [[44892, u'chr22', 39312883, 39313908, u'-', 132]] INFO locus bigger > 500 nt, skipping: [[6690, u'chr1', 226388694, 226390481, u'-', 15]] [41% |############################# |^MINFO locus bigger > 500 nt, skipping: [[54228, u'chr5', 77708307, 77776329, u'-', 18]]

Also I have noticed that there are 2 core files in the work folder both reporting : core.25739: ELF 64-bit LSB core file x86-64, version 1 (SYSV), SVR4-style, from 'RNAfold'

Is it possible that something went wrong with RNAfold - consider that my fragments have a size between 25-30pb after trimming, can this error be caused by these short reads? If not do you have any suggestions? We have updated the bcbio tools last week.

Thanks for your support!

lpantano commented 8 years ago

Hi @pierduemila,

Sorry about this. How much time has being running? Some time that step can be long. Can you post the config YAML file, please?

I pushed a big change in the pipeline last Friday. That simplifies the pipeline to miRNA calling only, and you can add more analysis. I did this, because each of them depend heavily on the species. See this config file: https://github.com/chapmanb/bcbio-nextgen/blob/master/config/templates/illumina-srnaseq.yaml (you can remove the expression_callers line to test the minimal pipeline)

If you are only interested in miRNA, you can update bcbio to last development and try again. Of course, if you are interested to run everything (because is human or mouse), then that is fine at it is.

If it has been more than 1 day, I can try to help debugging your specific case, for that I would need some data, so I don't know if you can share it with me.

cheers

pierduemila commented 8 years ago

Hello, thanks for your prompt support. my.yaml details:

The log output hasn't been updated yet since last night, but the submitted job has still a running status (apparently). I will update the bcbio with bcbio_nextgen.py upgrade --tools and keep you posted. I know that the quality of this run is bad and I am trying to get counts from only one of the 2 reads.

cheers

pierduemila commented 8 years ago

Hello, So we have updated the bcbio tools and now I am using the 0.9.9 but if I use the yaml file you linked above my run got stuck with the same error (my 1st comment Jun 16).

This is the last cmd run in bcbio-nextgen-commands.log: /apps/bcbio-nextgen/0.9.9/rhel6-x64/anaconda/bin/seqcluster report -o /prepSample_STAR_v3/work/seqcluster/report -r /Hsapiens/hg19/seq/hg19.fa -j /prepSample_STAR_v3/work/seqcluster/cluster/seqcluster.json

One (poor) solution I found, based on the fact that I have the count files in the final folder, is to remove in the yaml the 'expression caller' seqcluster.

I now that seqcluster will parse BAM files to produce miRNA annotation, including isomiRs and collapse them but I will need to dig more in the log files to understand what happen.

Any feedback/insights at this point would be great.

For instance, I have noticed that if you do not specify the expression caller at all I will get a self explanatory error on how to modify the yaml.

Bw P.

lpantano commented 8 years ago

Hi,

sorry about that. Can you send me the *debug.log file?

I will dig into the last comment. If there is no expression_caller option, it should do miRNA at least.

If you only want miRNA/isomiRs, then that would be enough. All the counts related to miRNA only should be inside mirbase folder. Do you have that folder?

cheers

pierduemila commented 8 years ago

Yes, I do have the folder mirbase. And in the final folder I have these files: counts_mirna_novel.tsv counts_mirna.tsv counts_novel.tsv counts.tsv

I think the tRNA counts are missing.

pierduemila commented 8 years ago

hi Lorena, just wanted to update on this. The error (related to seqcluster) was fixed after updating dev tools. Thanks for your support.

lpantano commented 8 years ago

Thanks for the update! I am happy it works again!