Closed sk-sahu closed 3 years ago
As @cgpu suggested
The error seems to be coming from this check:
if (args.task != 'prep' and args.stat
and (len(args.b1) * len(args.b2) == 0)
and (len(args.s1) * len(args.s2) == 0)):
sys.exit('ERROR: while performing statistical analysis, user should provide two groups of samples. Please check b1,b2 or s1,s2.')
We need to inspect if one of this conditions is violated but also find the actual version (this one is from the latest for quickness).
@cgpu thanks! This error makes sense though, because the pipeline is not properly making b2.txt - therefore rMATS cannot find it. It is unclear why the b2 generation is glitching on the cloud.
Thanks @angarb, is it working as expected on Sumner? If so, we might get some info on what's different and debug.
@cgpu With other datasets, this step works fine on Sumner. With this particular TCGA input 'bams.csv', we have not yet tested it on Sumner (since we were uncertain how to access the TCGA bam files in the cloud on Sumner). @sk-sahu informed us that we can just give paths to the google buckets in the bams.csv we should be able to access them on Sumner. However, @sk-sahu did test the portion of the script that generates b1 and b2 locally and this did not error.
@cgpu The other piece of evidence to suggest it is a cloud issue is that it works sporadically. There is definitely a randomness to whether or not b2 is generated from the same bams.csv file and rmats_pairs files. @lmurba has done a lot of testing with this
@angarb @lmurba thanks both, will work along @sk-sahu of this and keep this thread up to date.
@angarb @lmurba @cgpu
This unexpected issue of not generating b1.txt
and b2.txt
is fixed (This fix is from lifebit copy, will bring the changes to this JAX copy as well, Anyway going forward this will be the only repo as now this can be imported)
Fix description - There was a channel typo, so it couldn't able to fetch the proper files into the process.
After fix, Tested twice and in both cases it able to generate b1.txt
and b2.txt
As reported in this slack thread - Out of complete random sometime (because other times it work as expected) rmats process fails with this following error.
Faild jobs -
Success jobs -
The problem seems to be with defining b1 and b2. It looks like b1 is made correctly but for some reason b2 is blank.
This is something to do with the logic mentioned here to produce two files
b1.txt
andb2.txt
.I checked the error from back-end (work dir), although both files generated
b2.txt
is empty.Reproducing try
To reproduce this I extrapolated this script from main.nf
test_group.nf (click to expand)
```nextflow bams = "old_job/MYC_high_vs_low_bams_forCloudOS_updated2.csv" rmats_pairs = "old_job/BRCA_MYC_low_v_high_rmatsPairs_revised3.txt" Channel .fromPath(bams) .ifEmpty { exit 1, "Cannot find BAMs csv file : ${bams}" } .splitCsv(skip:1) .map { name, bam, bai -> [ name, file(bam), file(bai) ] } .into { indexed_bam; indexed_bam_rmats } indexed_bam_rmats .map { name, bam, bai -> [name, bam] } .set { bam } Channel .fromPath(rmats_pairs) .ifEmpty { exit 1, "Cannot find rMATS pairs file : ${rmats_pairs}" } .splitCsv(sep:' ') .map { row -> def rmats_id = row[0] def b1 = row[1].toString().split(',') def b2 = row[2].toString().split(',') [ rmats_id, b1, b2 ] } .set { samples} samples .map { row -> def samples_rmats_id = [] def rmats_id = row[0] def b1_samples = row[1] def b2_samples = row[2] b1_samples.each { sample -> samples_rmats_id.add([sample, 'b1', rmats_id]) } b2_samples.each { sample -> samples_rmats_id.add([sample, 'b2', rmats_id]) } samples_rmats_id } .flatMap() .combine(bam, by:0) .map { sample_id, b, rmats_id, bam -> [ rmats_id + b, rmats_id, bam] } .groupTuple() .map { b, rmats_id, bams -> [rmats_id[0], [b, bams]] } .groupTuple() .map { rmats_id, bams -> def b1_bams = bams[0][0].toString().endsWith('b1') ? bams[0] : bams[1] def b2_bams = bams[0][0].toString().endsWith('b2') ? bams[0] : bams[1] rmats_id_bams = b2_bams == null ? [ rmats_id, b1_bams[1], "no b2", true ] : [ rmats_id, b1_bams[1] , b2_bams[1], false ] rmats_id_bams } .set { bams } //bams.view() process rmats { echo true input: set val(rmats_id), file(bams), file(b2_bams), val(b1_only) from bams script: if (b1_only) { b1_bams = bams.join(",") b2_cmd = '' b2_flag = '' b2_config_cmd = '' } else { b1_bams = bams.join(",") b2_bams = b2_bams.join(",") b2_cmd = "echo b2.txt $b2_bams" b2_flag = "--b2 b2.txt" b2_config_cmd = "echo b2 b2.txt >> \$rmats_config" } """ echo b1.txt $b1_bams $b2_cmd """ } ```
Run
but it works completely fine as expected (creates two
b1.txt
andb2.txt
).