Taiji-pipeline / Taiji

All-in-one analysis pipeline
https://taiji-pipeline.github.io/
BSD 3-Clause "New" or "Revised" License
34 stars 9 forks source link

Taiji fails during RNA-seq read of quant data #42

Open dmalzl opened 1 year ago

dmalzl commented 1 year ago

Hi,

I am currently trying to run Taiji on a set of WT and KO RNAseq and ATACseq data. To not mess with previous analyses I decided to use the already existing gene quantification, which I did with subreads featureCounts, and postprocessed it to adhere to the format detailed in the documentation (Here I assumed the gene expression to be raw number of reads judging from the integers used in the format description). ATAC-seq is also supplied as already aligned and duplicate filtered data.

The pipeline starts up and tries to read the RNA-seq data but fails with the following error:

0m31m1m[ERROR][09-01 14:03] 0m0m31mRNA_Read_Input(7785..) Failed: user error (call: remote process died: DiedException "Prelude.read: no parse")
CallStack (from HasCallStack):
  error, called at src/Control/Workflow/Interpreter/Exec.hs:146:37 in SciFlow-0.8.0 IRKsT2ba9M716PeGlwt2FT:Control.Workflow.Interpreter.Exec

I tried to debug it myself but unfortunately couldn't locate the source code for RNA_Read_Input and I have never worked with Haskell or the used workflow manager so I am quite lost here. Could you please look into it?

Please find the used config, input and an example of the RNA-seq quant tables attached (note that I had to change the suffixes to txt because github wouldn't let me upload tsv and yml files). RNA-seq quant results were processed by counting reads per exon and summing them per ensemble gene_id. The resulting table was then filtered to contain only those genes that had at least 1 read count in one of the samples (3 replicates per condition = 6 samples). The remaining genes were then mapped to their gene_name (i.e. gene_name attribute in the gtf file)

rnaseq_KO2.txt taiji_input.txt taiji_config.txt

dmalzl commented 1 year ago

Okay it seems I have solved it myself. The culprit here was that I provided the tag information in the format column which is not correct and obviously results in an ill-configured run. Renaming the format column to tags worked such that the pipeline now runs without problems.