Open dmalzl opened 1 year ago
Okay it seems I have solved it myself. The culprit here was that I provided the tag information in the format
column which is not correct and obviously results in an ill-configured run. Renaming the format
column to tags
worked such that the pipeline now runs without problems.
Hi,
I am currently trying to run Taiji on a set of WT and KO RNAseq and ATACseq data. To not mess with previous analyses I decided to use the already existing gene quantification, which I did with subreads featureCounts, and postprocessed it to adhere to the format detailed in the documentation (Here I assumed the gene expression to be raw number of reads judging from the integers used in the format description). ATAC-seq is also supplied as already aligned and duplicate filtered data.
The pipeline starts up and tries to read the RNA-seq data but fails with the following error:
I tried to debug it myself but unfortunately couldn't locate the source code for
RNA_Read_Input
and I have never worked with Haskell or the used workflow manager so I am quite lost here. Could you please look into it?Please find the used config, input and an example of the RNA-seq quant tables attached (note that I had to change the suffixes to txt because github wouldn't let me upload tsv and yml files). RNA-seq quant results were processed by counting reads per exon and summing them per ensemble gene_id. The resulting table was then filtered to contain only those genes that had at least 1 read count in one of the samples (3 replicates per condition = 6 samples). The remaining genes were then mapped to their gene_name (i.e. gene_name attribute in the gtf file)
rnaseq_KO2.txt taiji_input.txt taiji_config.txt