Taiji-pipeline / Taiji

All-in-one analysis pipeline
https://taiji-pipeline.github.io/
BSD 3-Clause "New" or "Revised" License
33 stars 9 forks source link

Program exits with erros. #4

Closed ShikanZ closed 4 years ago

ShikanZ commented 4 years ago

Hi I am recently runing Taiji pipeline, intergrated analysis of RNAseq and ATACseq. However, it gives me this error message "Program exits with errors". I have no idea what the errors are. Right now the output file has ATAC folder and RNA folder. RNA folder is empty, while ATAC folder has Bam, Bed, BigWig, GeneQuant, Peaks, QC, and TFBS.

Thanks,

Shikan

kaizhang commented 4 years ago

Do you have the full log?

ShikanZ commented 4 years ago

Log.zip

kaizhang commented 4 years ago

@ShikanZ I need the terminal log produced by taiji. If you did not record that, you can run the program with the same command again. It will not rerun the finished steps and it should reproduce any error message.

ShikanZ commented 4 years ago

Those are the errors. I don't need the RSEM indices. Generating RSEM indices [ERROR][05-29 11:49] RNA_Make_Index(b29a..) Failed: Ran commands: mkdir -p /mnt/Data/Cui_lab/Shen/taiji_output/GENOME/RSEM_index rsem-prepare-reference --gtf /mnt/Data/references/GRCm38/Sequence/Annotation/Genes/genes.gtf /mnt/Data/references/GRCm38/Sequence/STARIndex/genome.fa /mnt/Data/Cui_lab/Shen/taiji_output/GENOME/RSEM_index/MM10 which rsem-prepare-reference Exception: user error (shelly did not find rsem-prepare-reference in the PATH: /home/shikan/miniconda3/bin:/home/shikan/miniconda3/condabin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin) CallStack (from HasCallStack): error, called at src/Control/Workflow/Interpreter/Exec.hs:134:37 in SciFlow-0.7.2-Jc8TJcu7aUL61DWlZpDMFY:Control.Workflow.Interpreter.Exec

[ERROR][05-29 14:10] ATAC_Make_Expr_Table(0509..) Failed: Prelude.tail: empty list CallStack (from HasCallStack): error, called at src/Control/Workflow/Interpreter/Exec.hs:134:37 in SciFlow-0.7.2-Jc8TJcu7aUL61DWlZpDMFY:Control.Workflow.Interpreter.Exec

[ERROR][05-29 16:33] Program exits with errors

kaizhang commented 4 years ago

RSEM is needed for RNA quantification. If you have processed RNA-seq by yourself, you can input those quantifications directly.

ShikanZ commented 4 years ago

I used salmon for RNA quantification. How do I input those files? Instead of inputing RNA fastq files, I will put quant files in the input.yml?

ShikanZ commented 4 years ago

And also what about the second and third errors?

kaizhang commented 4 years ago

Yes, see here: https://taiji-pipeline.github.io/documentation/input.html#rna-seq.

If you are using TSV input file, then put:

RNA-seq id group_name 1 filepath GeneQuant

For the second error, go to the GeneQuant directory and see if those files are empty.

ShikanZ commented 4 years ago

GeneQuant has 4 files but they are empty

kaizhang commented 4 years ago

What annotation file are you using?

ShikanZ commented 4 years ago

GTF file from Ensembl.

kaizhang commented 4 years ago

Please check if you have "gene" feature in the 3rd column.

ShikanZ commented 4 years ago

No it doesn't have "gene" feature in the 3rd column

kaizhang commented 4 years ago

You need to use the full annotation with "gene", "transcript", etc. The one you used is probably for transcriptome analysis.

After updating the annotation file, run: taiji delete ATAC_Compute_TE ATAC_Gene_Count --delete-all to update the cache and then run the program again.

ShikanZ commented 4 years ago

gene feature I'm sorry. I think it has the gene feature

kaizhang commented 4 years ago

I didn't see any "gene" in the 3rd column. Also please note that ENSEMBL use "1,2,3..." to name chromosomes. Please make sure you use the same chromosome naming scheme during the rest of your analysis.

Example from the ensembl website:

1 transcribed_unprocessed_pseudogene  gene        11869 14409 . + . gene_id "ENSG00000223972"; gene_name "DDX11L1"; gene_source "havana"; gene_biotype "transcribed_unprocessed_pseudogene"; 
ShikanZ commented 4 years ago

I have got this new error.

[ERROR][06-04 01:23] RNA_Make_Expr_Table(39aa..) Failed: readDouble: Fail to cast ByteString to Double:"Length" CallStack (from HasCallStack): error, called at src/Bio/Utils/Misc.hs:24:14 in bioinformatics-toolkit-0.9.3.1-CNRcUDPNpO01sIdCsENaFx:Bio.Utils.Misc CallStack (from HasCallStack): error, called at src/Control/Workflow/Interpreter/Exec.hs:134:37 in SciFlow-0.7.2-Jc8TJcu7aUL61DWlZpDMFY:Control.Workflow.Interpreter.Exec

kaizhang commented 4 years ago

Do not put anything additional to the expression file. Just the gene and expression value:

Gene1 <TAB> 12
Gene2 <TAB> 20
Gene3 <TAB> 25
ShikanZ commented 4 years ago

And also I'm using fastq files to do the analysis. Here is the error. [ERROR][06-04 08:32] ATAC_QC(df4d..) Failed: Statistics.Matrix.fromRows: zero columns in matrix CallStack (from HasCallStack): error, called at src/Statistics/Matrix.hs:93:23 in dense-linear-algebra-0.1.0.0-K4G29RdkqcAAh5dSEMcmWb:Statistics.Matrix CallStack (from HasCallStack): error, called at src/Control/Workflow/Interpreter/Exec.hs:134:37 in SciFlow-0.7.2-Jc8TJcu7aUL61DWlZpDMFY:Control.Workflow.Interpreter.Exec

kaizhang commented 4 years ago

Please check the sizes of bed files and peak files to see if there is anything wrong.

ShikanZ commented 4 years ago

They are 0 byte.

kaizhang commented 4 years ago

You can run samtools flagstat on the bam files to see if there was something wrong during the alignment.

ShikanZ commented 4 years ago

I looked at couple of them, They are pretty much look like this

79726438 + 0 in total (QC-passed reads + QC-failed reads) 0 + 0 secondary 0 + 0 supplementary 0 + 0 duplicates 67590366 + 0 mapped (84.78% : N/A) 79726438 + 0 paired in sequencing 39863219 + 0 read1 39863219 + 0 read2 0 + 0 properly paired (0.00% : N/A) 62821202 + 0 with itself and mate mapped 4769164 + 0 singletons (5.98% : N/A) 7078490 + 0 with mate mapped to a different chr 1240900 + 0 with mate mapped to a different chr (mapQ>=5)

kaizhang commented 4 years ago

Just to confirm: the files in the BED directory are also empty? If so how about the files in the Bam directory? Are there any close to empty file?

ShikanZ commented 4 years ago

Bam files are normal, but Bed files are empty.

kaizhang commented 4 years ago

That's weird. I'm not sure what's going on. Maybe try rerunning the Bed conversion:

taiji delete ATAC_Bam_To_Bed
taiji run --config config.yml --select ATAC_Bam_To_Bed

See if the bed files can be produced.

ShikanZ commented 4 years ago

It gives me this "Parts of the program finish successfully" right away.

kaizhang commented 4 years ago

What is the result of ls -lh on the directory containing bam files.

ShikanZ commented 4 years ago

bam bam_size

kaizhang commented 4 years ago

It seems that all reads were filtered out. And I saw a problem from the flagstat output:

79726438 + 0 in total (QC-passed reads + QC-failed reads) 0 + 0 secondary 0 + 0 supplementary 0 + 0 duplicates 67590366 + 0 mapped (84.78% : N/A) 79726438 + 0 paired in sequencing 39863219 + 0 read1 39863219 + 0 read2 0 + 0 properly paired (0.00% : N/A) 62821202 + 0 with itself and mate mapped 4769164 + 0 singletons (5.98% : N/A) 7078490 + 0 with mate mapped to a different chr 1240900 + 0 with mate mapped to a different chr (mapQ>=5)

The reads are not properly paired. Could you post the input file?

ShikanZ commented 4 years ago

RNA-seq:

ATAC-seq:

kaizhang commented 4 years ago

The input looks good. I'm not sure why the reads are not properly paired. The alignment was created by bwa mem -M -k 32.

ShikanZ commented 4 years ago

I also create another input.yml file. Can you check if there is anything incorrect?

RNA-seq:

ATAC-seq:

kaizhang commented 4 years ago

Looks good to me!

kaizhang commented 4 years ago

But make sure the reads in bam files are properly paired!

ShikanZ commented 4 years ago

The pipeline finished successfully, but I found that in the RNAseq(input fastq files) folder there are two samples are missing. I have 3 groups, each group having 3 replicates. However, one group only has the rep 3 in the folder, rep 1 and rep2 are missing.

ShikanZ commented 4 years ago

I think I found out what happened. I've made a mistake in the input.yml.