Closed dimitras closed 2 years ago
Sorry for the late response.
One thing I'm confused about is how to get the correct meme file for the motifs from hg19. I have tried downloading from meme suite but I suppose the memes provided there are based on the latest genome?
Motif file is not assembly-specific. Here is the motif file I used for the human genome: https://taiji-pipeline.github.io/documentation/_downloads/cisBP_human.meme.
Another question I have is about the input.tsv. For the ATAC, is it enough if I give for each sample the narrow peaks bed file like:
If you have bam files, it is better to include that as well, using:
type id group rep path format
ATAC-seq h40 cond1 1 h40.narrow_Peaks.bed NarrowPeak
ATAC-seq h40 cond1 1 h40.bam Bam
For the RNA, my input per sample looks like that: RNA-seq H80 cond1 60 H80.abundance.txt GeneQuant Expression profile
Correct format should be something like this:
type id group rep path tags
RNA-seq H80 cond1 60 H80.abundance.txt GeneQuant
Here is an example file: https://taiji-pipeline.github.io/documentation/_downloads/example_input.tsv
I even tried with giving a fai file as well. Created .fai with samtools and added to config:
Genome index is generated by Taiji. Don't generate it by yourself.
Thank you for your response Kai.
I have an issue that I am not sure how to resolve. Taiji runs up to the Confusion_Table step and then crashes. I wonder if there is sth in the bed or bad format I should check or maybe it is a resources issue?
[INFO][04-18 12:20] Confusion_Table(f7b8..): Running ...
[INFO][04-18 12:20] Confusion_Table(f7b8..): Complete!
[INFO][04-18 12:20] ATAC_Make_Index(d30b..): Running ...
[ERROR][04-18 12:20] ATAC_Make_Index(d30b..) Failed: user error (call: remote process died: DiedDisconnect)
CallStack (from HasCallStack):
error, called at src/Control/Workflow/Interpreter/Exec.hs:146:37 in SciFlow-0.8.0-IRKsT2ba9M716PeGlwt2FT:Control.Workflow.Interpreter.Exec
[ERROR][04-18 12:20] Program exits with errors
The output looks like:
output/
├── ATACSeq
│ └── Download
├── GENOME
│ └── HG19.index
├── RNASeq
│ ├── Download
│ ├── expression_profile.tsv
│ ├── Gt001.BM_rep8_gene_quant.tsv
│ ├── Gt001.BN_rep8_gene_quant.tsv
........
── SCATACSeq
└── Spectral
The sciflow.db file for last two steps looks like:
taiji show Confusion_Table
<< Confusion_Table(f7b838d7b32a98cefcdccfb0946a253e56b1c6b7ef47c502fb141a0e1c15cd5a) >>
()
taiji show ATAC_Make_Index
(which returns nothing)
Is the ATAC_Download_Data step as it should be?
taiji show ATAC_Download_Data
<< ATAC_Download_Data(fe0fc639c515e5d42e63fff5fbb95d4269c9f5d607c771aff63f0a834834b4d7) >>
ATACSeq
{ atacseqCommon =
CommonFields
{ _commonEid = "Gt111.BN"
, _commonGroupName = Just "BN"
, _commonSampleName = ""
, _commonReplicates =
fromList
[ ( 1
, Replicate
{ replicateFiles =
[ Left
( []
, Bed
, File
{ fileLocation =
"MACS2.471.narrow_Peaks.bed"
, fileInfo = fromList []
}
)
, Left
( []
, Bam
, File
{ fileLocation =
"471.bam"
, fileInfo = fromList []
}
)
]
, _replicateInfo = fromList []
}
)
]
}
}
Any insight would be so appreciated. Let me know if you need more info.
Thanks again, Dimitra
The ATAC_Make_Index
step is failing. You could run:
Taiji run --config your_config_file --select ATAC_Make_Index
This runs the ATAC_Make_Index
step only. Please see if the error message is more clear this time. If you are running the program using a workload manager/distributed computing, please look into the log files as well.
Thank you for your quick response!
The slurm has logged:
/var/spool/slurm/slurmd/job36819951/slurm_script: line 3: 73488 Bus error master_id=tsetmtnhyfmyfxtf taiji remote --ip cn2316 --port 47481
But yes let me try to run ATAC_Make_Index as you suggested.
Running only the ATAC_Make_Index step, I got the exact same error:
[INFO][04-18 15:10] Found a new worker: pid://cn1042:8000:0:8
[INFO][04-18 15:10] ATAC_Make_Index(d30b..): Running ...
[ERROR][04-18 15:10] ATAC_Make_Index(d30b..) Failed: user error (call: remote process died: DiedDisconnect)
CallStack (from HasCallStack):
error, called at src/Control/Workflow/Interpreter/Exec.hs:146:37 in SciFlow-0.8.0-IRKsT2ba9M716PeGlwt2FT:Control.Workflow.Interpreter.Exec
[ERROR][04-18 15:10] Program exits with errors
And in the slurm log:
/var/spool/slurm/slurmd/job36841772/slurm_script: line 3: 26442 Bus error master_id=uwqjacxmjumfhbaz taiji remote --ip cn0910 --port 52092
ATAC_Make_Index
calls bwa
to build the index which uses a lot of memory. Please make sure you request enough memory when using slurm. For example, 50G
I was giving it 80G, but I gave it 120G this time. It was both memory and apparently a permission issue. The index file that was generated in an earlier step didn't have exec permission.
Generating genome index ...
[INFO][04-18 15:24] ATAC_Make_Index(d30b..): Running ...
[ERROR][04-18 15:24] ATAC_Make_Index(d30b..) Failed: output//GENOME/HG19.index: openFile: resource busy (file is locked)
CallStack (from HasCallStack):
error, called at src/Control/Workflow/Interpreter/Exec.hs:146:37 in SciFlow-0.8.0-IRKsT2ba9M716PeGlwt2FT:Control.Workflow.Interpreter.Exec
[ERROR][04-18 15:24] Program exits with errors
-rw-r----- 1 dimitras 0 Apr 18 15:10 output/GENOME/HG19.index
After giving it exec permission, it FINISHED :)
Thank you so much for your help troubleshooting this!
Can I run it to go on through all the steps till the end? Or better run it from the beginning?
just run taiji run --config your_config.yml
normally.
Hi Kai, thank you so much for your help. I haven't posted yet because I didn't manage to run it successfully to the end yet. I was in the middle of troubleshooting but had to pause for a few days. I'll post again if I don't manage to resolve it. Thank you again!
Hi Kai, I am trying to run Taiji on my bulk ATAC- and RNA-Seq data. The assembly used is hg19.
So my config file looks like:
One thing I'm confused about is how to get the correct meme file for the motifs from hg19. I have tried downloading from meme suite but I suppose the memes provided there are based on the latest genome?
Another question I have is about the input.tsv. For the ATAC, is it enough if I give for each sample the narrow peaks bed file like:
Or should I give the bam file as well? Like:
For the RNA, my input per sample looks like that:
RNA-seq H80 cond1 60 H80.abundance.txt GeneQuant Expression profile
And then running
taiji run --config config.yml --cloud
I get the following error:It has created the sciflow.db and in the output the only thing is created is:
I even tried with giving a fai file as well. Created .fai with samtools and added to config:
genome_index: "annotation/Homo_sapiens.GRCh37.dna.primary_assembly.fa.fai"
But I got the exact same error.
Any insights would be greatly appreciated! Thank you!