Error: Program exits with errors during configuration

claraina commented 3 years ago

I have downloaded installed Taiji using Ubunto, and am currently trying to follow the Taiji tutorial. I have downloaded both the example files, and have editted the example_config according to the Configuration page. However, when I run example_config.yml, I get an error saying 'Program exits with errors during configuration'. I'm not very good with coding and am new to programming, so I am wondering if I've installed Taiji correctly and how to solve this issue? Thank you.

kaizhang commented 3 years ago

The problem is with the format of your input file. Could you paste your input and config file here?

claraina commented 3 years ago

Github won't let me upload yml files, so I've uploaded a txt copy for this post - but I was using the yml version of these files for the code.

example_input.txt example_config.txt

kaizhang commented 3 years ago

Did you use the correct file name/path (for input) in the config file?

claraina commented 3 years ago

I think so. Even when I edit the config yml file to have input: "data/example_input.yml", I still get the same error. I am not sure about the path of the file, should I have the files in a specific folder? Currently they are in the root folder - the same folder where taiji outputs the HTML file Taiji workflow (when I run taiji view taiji.html)

kaizhang commented 3 years ago

Then I guess the correct path should be "example_input.yml", NOT "data/example_input.yml"

claraina commented 3 years ago

I changed the path, but this time get another type of error when I run the line

kaizhang commented 3 years ago

The id must be unique. Please change one of the two "heart_ATAC" to "heart_HiC"

claraina commented 3 years ago

That seemed to have worked. However, when it runs the next process it comes up with these errors:

[ERROR][08-06 22:31] ATAC_Download_Data(7b69..) Failed: Ran commands: which fasterq-dump which fastq-dump

Exception: Please install sra-tools: https://github.com/ncbi/sra-tools CallStack (from HasCallStack): error, called at src/Bio/Pipeline/Download.hs:57:22 in bio-pipelines-0.1.3-4gmfUxwk1vs5RwCRspqCSv:Bio.Pipeline.Download CallStack (from HasCallStack): error, called at src/Control/Workflow/Interpreter/Exec.hs:145:37 in SciFlow-0.7.2-Jc8TJcu7aUL61DWlZpDMFY:Control.Workflow.Interpreter.Exec

[INFO][08-06 22:31] RNA_Make_Index(2a31..): Running ... Generating STAR indices [ERROR][08-06 22:31] RNA_Make_Index(2a31..) Failed: Ran commands: mkdir -p /root/output/GENOME/STAR_index/ STAR --runThreadN 4 --runMode genomeGenerate --genomeDir output//GENOME/STARindex/ --genomeFastaFiles output//GENOME/GRCH38.fasta --sjdbGTFfile output//GENOME/GRCH38.gtf --sjdbOverhang 100 which STAR

Exception: user error (shelly did not find STAR in the PATH: .............. error, called at src/Control/Workflow/Interpreter/Exec.hs:145:37 in SciFlow-0.7.2-Jc8TJcu7aUL61DWlZpDMFY:Control.Workflow.Interpreter.Exec [ERROR][08-06 22:31] Program exits with errors

I can try downloading the sra toolkit, but am not sure what I should do regarding STAR

kaizhang commented 3 years ago

You need to install STAR to analyze FASTQ files from RNA-seq experiment. You can download STAR from here: https://github.com/alexdobin/STAR

You will also need RSEM later for gene quantification: https://github.com/deweylab/RSEM

claraina commented 3 years ago

Sorry for the late response - I have been reinstalling Ubuntu on a VM and also Taiji. I am now running my own data (have attached my input and config file), and get this error. I am not sure where that specific line of code is coming from (I notice there are two // in the path, which might be causing this?), but I do have the output folder containing GSEXXX. I am getting a similar error for when making an index for my RNAseq data and the path also includes two '/' rather than one?

I've also attached my config (I will just be using the standard settings based on assembly = "mm10" even though I do have files listed) and input files. Config and Input.zip

kaizhang commented 3 years ago

STAR needs a lot of memory (30 ~ 50G). The error means you do not have enough memory.

claraina commented 3 years ago

Thank you, that seemed to solve the problem for RNA-seq, but I still get the same error for the ATAC-Seq regarding GSEXXX. I have it in fastq.gz form, and even when I extract it normally (not using the command prompt), the fastq file has nothing in it (unlike the ENCFF844G file, seems to be working fine).

It is now generating the bwa_index and i am wondering how long this is supposed to take? I've let it run on my laptop for about 12 hours and it is still on the same line - but I'll try increasing my memory again.

kaizhang commented 3 years ago

Sorry for the late response. The running time depends on the configuration of your computer and the size of the dataset. So I can not answer this question. But 12 hours is not ridiculously long.

maeleck commented 3 years ago

I am not sure if I can ask a question on someone else's issue but maybe add in -n 3 +RTS -N3 to adjust for core/thread number dedicated to running Taiji to speed up.

My question is that sometimes Compute_Ranks step doesn't seem to generate GeneRanks.tsv and other files after processing bunch of ATAC and RNA seq data, even though everything seems fine. It kept saying something like The program finished successfully after ATAC_make_expr_table and RNA_make_expr_table and bunch of steps. I even tried to delete the caches and I didn't get any error. Still no generanks. But the software was able to generate GeneRanks.tsv if I used different set of ATAC and RNA seq data. I am guessing that Taiji can only recognize some RNA seq fastq files for some reasons.

RNA seq data in question https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE103039

kaizhang commented 3 years ago

@maeleck Could you open a new issue and upload the input and config files? I will take a look. Thanks!

Taiji-pipeline / Taiji

Error: Program exits with errors during configuration #8