egaffo / circompara2

Improved bioinformatic pipeline to identify and quantify circRNA expression from RNA-seq data by combining multiple circRNA detection methods
Other
8 stars 0 forks source link

resume command ??? #8

Open hafizmtalha opened 2 years ago

hafizmtalha commented 2 years ago

IS there any option to resume the circompara2 comand from any last point ???

egaffo commented 2 years ago

Not really an option. You can remove all the files generated after the point you want (or just move them into another directory in case you'll want to revert them). circompara2 will find out which files are missing and rerun the appropriate tasks at your next run. If your case, instead, is that circompara2 was interrupted (f.i. because of a power loss), you might check (from your log) which tasks were not finished and delete possible partial output files of those tasks. Then, launch again circompara2. You might use the "-n" option (i.e. dryrun) to see which tasks circompara2 will perform, without actually executing the tasks.

hafizmtalha commented 2 years ago

ok.. let me check. in case I rerun the whole pipeline with few samples or run the samples one by one. can I merge the final result files ???

egaffo commented 1 year ago

You'll have to do it "by hand" with your custom code

ChengxuanChen10 commented 7 months ago

ok.. let me check. in case I rerun the whole pipeline with few samples or run the samples one by one. can I merge the final result files ???

I am facing the same problem when running the samples one by one, and afterward, I want to merge them. How do you do this? Which step should I begin to 'do it by hand'? Thanks.

egaffo commented 7 months ago

You can now use the combine_ccp2_runs() function of the ccp2tools R package I am developing to get the combined results in your R scripts. That function accepts as input a list of directories of CirComPara2 runs (one per sample in your case). The function knows the structure of CirComPara2 output and automatically merges preprocessing stats tables, expression matrices of circRNAs, linear spliced reads on the backsplices, and genes/linear transcripts (via tximport). Install the ccp2tools package in R through BiocManager(), check the documentation of the combine_ccp2_runs() function and let me know if it was of help.

ChengxuanChen10 commented 7 months ago

Thank you for your reply! It is a great help for combine_ccp2_runs() to combine the results. But I get an error message as:

>combine_ccp2_runs(files) Merging read statistics... Merging read statistic files: ac1/read_statistics/read_stats_collect/processing_and_mapped_read_counts.csv ac2/read_statistics/read_stats_collect/processing_and_mapped_read_counts.csv ac3/read_statistics/read_stats_collect/processing_and_mapped_read_counts.csv ac4/read_statistics/read_stats_collect/processing_and_mapped_read_counts.csv Combining BJR counts from 4 projects... Error in eval(bysub, x, parent.frame()) : object 'strand' not found

It seems that a column named 'strand' is required in the circular_expression/circrna_analyze/counts/bks.counts.union.csv file. But I do not found strand information in that file.

egaffo commented 7 months ago

Are your reads stranded and you set the strandness parameter in the vars.py file?

The combine_ccp2_runs()'s default is to consider stranded sequencing libraries (there is a parameter is_stranded). This works only if you set the HISAT2_EXTRA_PARAMS in the vars.py configuration file (f.i. HISAT2_EXTRA_PARAMS = = '--rna-strandness RF' for usual Illumina stranded libraries)

Otherwise, try setting is_stranded = FASLE in the combine_ccp2_runs()parameters.

ChengxuanChen10 commented 7 months ago

OK. Thank you!