davidaknowles / leafcutter

Annotation-free quantification of RNA splicing. Yang I. Li, David A. Knowles, Jack Humphrey, Alvaro N. Barbeira, Scott P. Dickinson, Hae Kyung Im, Jonathan K. Pritchard
http://davidaknowles.github.io/leafcutter/
Apache License 2.0
203 stars 113 forks source link

leafcutter_cluster.py is not working on the new file format of .junc #163

Closed shannjiang closed 3 years ago

shannjiang commented 3 years ago

Hi,

I am trying to do intron clustering with leafcutter_cluster.py on the .junc files generated.

I found the python script is working fine on the old .junc file format like this: old_leafcutter_res But got error when on the new .junc file format like this: new_leafcutter_res Error info: [jiangs09@li03c03 leafcutter]$ python /sc/arion/projects/CommonMind/roussp01a/alz_phg_bin_zhang/rnaseq/leafcutter/clustering/leafcutter_cluster.py -j /sc/arion/projects/CommonMind/roussp01a/alz_phg_bin_zhang/rnaseq/temp/leafcutter.list.txt -m 50 -o PHG -l 500000 scanning /sc/arion/projects/CommonMind/roussp01a/alz_phg_bin_zhang/rnaseq/batch//Sample_178941-MgAs-RNA/Processed/RAPiD/leafcutter/Sample_178941-MgAs-RNA.reverse.output.junc... Traceback (most recent call last): File "/sc/arion/projects/CommonMind/roussp01a/alz_phg_bin_zhang/rnaseq/leafcutter/clustering/leafcutter_cluster.py", line 489, in main(options, libl) File "/sc/arion/projects/CommonMind/roussp01a/alz_phg_bin_zhang/rnaseq/leafcutter/clustering/leafcutter_cluster.py", line 11, in main pool_junc_reads(libl, options) File "/sc/arion/projects/CommonMind/roussp01a/alz_phg_bin_zhang/rnaseq/leafcutter/clustering/leafcutter_cluster.py", line 46, in pool_junc_reads chrom, A, B, dot, counts, strand = lnsplit ValueError: too many values to unpack

the old format is generated with leafcutter 0.2.7 which is the last release in 2017, and the new format is generated with the version from Github at 2019/09/10, commit 54235a4. Is there a new leafcutter_cluster.py script I can use on the new format or I have to convert the new .junc back to the old one? I guess there are more information in the new format, right?

thanks,

Shan

jackhump commented 3 years ago

Dear Shan,

hello to a fellow Sinai person!

The 'new' junction format is created by regtools. To perform clustering you must use leafcutter_cluster_regtools.py script.

Users have reported issues with using leafcutter_cluster_regtools.py . This can either be due to chromosome naming (use the --checkchrom flag to throw out any non-standard chromosomes) or "?" in the stranding, which is due to the -s parameter in regtools not matching the stranding of your library.

best wishes,

Jack

shannjiang commented 3 years ago

Dear Jack,

The new python script is working on the new format!

Thanks! Just noticed you are also from Sinai, lol. You are smart can locate me thru the path info, lol.

Shan