ExpressionAnalysis / STAR-SEQR

RNA Fusion Detection and Quantification
Other
16 stars 12 forks source link

nested renamer is not supported #28

Open Ajeet1699 opened 3 years ago

Ajeet1699 commented 3 years ago

Hi I am running starseqr on my samples and stuck in error.

starseqr.py -1 sample_1.fastq.gz -2 sample_2.fastq.gz -m 1 -p starseqr_test -t 50 -i STAR_FUSION_LIB/ref_genome.fa.star.idx/ -g genomic.gtf -r genomic.fa -vv 2021-06-22 10:13 - INFO - ***STAR-SEQR** 2021-06-22 10:13 - INFO - CMD = /home/nipgr/software/STAR-SEQR/myenv/bin/starseqr.py -1 sample_1.fastq.gz -2 sample_2.fastq.gz -m 1 -p starseqr_test -t 50 -i STAR_FUSION_LIB/ref_genome.fa.star.idx/ -g genomic.gtf -r genomic.fa -vv 2021-06-22 10:13 - INFO - STAR-SEQR_version = 0.6.7 2021-06-22 10:13 - INFO - Starting to work on sample: /home/nipgr/Documents/chickpea/starseqr_test 2021-06-22 10:13 - INFO - Found input: sample_1.fastq.gz 2021-06-22 10:13 - INFO - Found input: sample_2.fastq.gz 2021-06-22 10:13 - INFO - Found input: genomic.fa 2021-06-22 10:13 - INFO - Found input: genomic.gtf 2021-06-22 10:13 - INFO - Starting STAR Alignment 2021-06-22 10:13 - INFO - STAR Command: STAR --readFilesIn sample_1.fastq.gz sample_2.fastq.gz --readFilesCommand zcat --runThreadN 50 --genomeDir STAR_FUSION_LIB/ref_genome.fa.star.idx --outFileNamePrefix starseqr_test_STAR-SEQR/starseqr_test. --chimScoreJunctionNonGTAG -1 --outSAMtype None --chimOutType Junctions SeparateSAMold --alignSJDBoverhangMin 5 --outFilterMultimapScoreRange 1 --outFilterMultimapNmax 5 --outMultimapperOrder Random --outSAMattributes NH HI AS nM --chimSegmentMin 10 --chimJunctionOverhangMin 10 --chimScoreMin 1 --chimScoreDropMax 30 --chimScoreSeparation 7 --chimSegmentReadGapMax 3 --chimFilter None --twopassMode None --alignSJstitchMismatchNmax 5 -1 5 5 --chimMainSegmentMultNmax 10 2021-06-22 10:14 - INFO - b'Jun 22 10:13:02 ..... started STAR run\nJun 22 10:13:02 ..... loading genome\nJun 22 10:13:04 ..... started mapping\nJun 22 10:14:37 ..... finished mapping\nJun 22 10:14:37 ..... finished successfully\n' 2021-06-22 10:14 - INFO - STAR Alignment Finished! 2021-06-22 10:14 - INFO - Importing junctions 2021-06-22 10:14 - INFO - Number of candidates removed due to Mitochondria filter: 0 2021-06-22 10:14 - INFO - Removing duplicate reads 2021-06-22 10:14 - INFO - Begin multiprocessing of function apply_cigar_overhang in a pool of 50 workers using map_async protocol 2021-06-22 10:14 - INFO - Ordering junctions 2021-06-22 10:14 - INFO - Normalizing junctions 2021-06-22 10:14 - INFO - Begin multiprocessing of function apply_normalize_jxns in a pool of 50 workers using map_async protocol 2021-06-22 10:14 - INFO - Getting gene strand and flipping info as necessary 2021-06-22 10:14 - INFO - Begin multiprocessing of function apply_jxn_strand in a pool of 50 workers using map_async protocol 2021-06-22 10:15 - INFO - Begin multiprocessing of function apply_flip_func in a pool of 50 workers using map_async protocol 2021-06-22 10:15 - INFO - Aggregating junctions Traceback (most recent call last): File "/home/user/software/STAR-SEQR/myenv/bin/starseqr.py", line 622, in sys.exit(main()) File "/home/user/software/STAR-SEQR/myenv/bin/starseqr.py", line 345, in main jxn_summary = su.core.count_jxns(jxns) File "/home/user/software/STAR-SEQR/myenv/lib64/python3.6/site-packages/starseqr_utils/core.py", line 123, in count_jxns col)), ('counts', 'count')])), ('overhang_len', 'max')])).reset_index() File "/home/user/software/STAR-SEQR/myenv/lib64/python3.6/site-packages/pandas/core/groupby/generic.py", line 940, in aggregate result, how = self._aggregate(func, args, **kwargs) File "/home/user/software/STAR-SEQR/myenv/lib64/python3.6/site-packages/pandas/core/base.py", line 351, in _aggregate raise SpecificationError("nested renamer is not supported") pandas.core.base.SpecificationError: nested renamer is not supported

Ajeet1699 commented 3 years ago

i have tried to print dataframe in "starseqr.py" just before pass it to function "count_jxns" in core.py

it gives output header as:

index, chrom1, pos1, str1, chrom2, pos2, str2, jxntype, jxnleft, jxnright, readid, base1, cigar1, base2, cigar2, identity, overhang_len, order, name, test_strand, flip

So maybe count_jxns tried to aggregate non existing columns, as mentioned in core.py:

new_df = grouped_df.agg(OrderedDict([('readid', OrderedDict([('reads', lambda col: ','.join( col)), ('counts', 'count')])), ('overhang_len', 'max')])).reset_index()

or maybe any other error?

sjewell-biociphers commented 1 year ago

I was able to fix this by installing pandas==0.25.0

... not really sure how great of a solution this is but oh well.