WGLab / LIQA

Long-read Isoform Quantification and Analysis
Other
39 stars 13 forks source link

Error in quantifying isoform expression #1

Closed maxh190 closed 6 months ago

maxh190 commented 3 years ago

To Whom It May Concern,

When trying to quantify isoform expression for a dataset (attached) by "python $LIQA_PATH/LIQA.py -task quantify -refgene $refgene_PATH/chr1.refgene -bam $data_PATH/test1.sort.bam -out test1.isoform -max_distance 10", I got the following error: LIQA.py -task quantify -refgene -bam -out -max_distance -f_weight

Can you please let me know how to prepare for a "weight of F function"?

When trying to generate isoform relative abundances for a dataset (attached) using PennSeq, I got the following error: test1.aln.sam.txt

using PennSeq, I got the following error: Traceback (most recent call last): File "PennSeq/PennSeq.py", line 289, in test1.aln.sam.txt

cigarMatchRead1, cigarNumberRead1, cigarMatchInfoCount1, cigarNumberInfoCount1 = getCigarStringInformation(readCigar, readName, 1) File "PennSeq/PennSeq.py", line 24, in getCigarStringInformation cigarNumberRead[cigarMatchInfoCount] = int(splitCigar[i]) ValueError: invalid literal for int() with base 10: '92=1'

The dataset (a sam file) was attached.

Thanks a lot for your help.

Max

huyustats commented 3 years ago

Hi Max,

Thank you so much for interests in using LIQA and PennSeq.

weight_F in LIQA allows user to specify the weight for bias correction in estimation process. We usually recommend user to use 1.

PennSeq only takes bam file as an input. Please try to convert sam to a sorted and indexed bam file in the analysis.

Thank you! Yu

To Whom It May Concern,

When trying to quantify isoform expression for a dataset (attached) by "python $LIQA_PATH/LIQA.py -task quantify -refgene $refgene_PATH/chr1.refgene -bam $data_PATH/test1.sort.bam -out test1.isoform -max_distance 10", I got the following error: LIQA.py -task quantify -refgene -bam -out -max_distance -f_weight

Can you please let me know how to prepare for a "weight of F function"?

When trying to generate isoform relative abundances for a dataset (attached) using PennSeq, I got the following error: test1.aln.sam.txt

using PennSeq, I got the following error: Traceback (most recent call last): File "/home/hadoop/program/PennSeq/PennSeq.py", line 289, in test1.aln.sam.txt

cigarMatchRead1, cigarNumberRead1, cigarMatchInfoCount1, cigarNumberInfoCount1 = getCigarStringInformation(readCigar, readName, 1) File "/home/hadoop/program/PennSeq/PennSeq.py", line 24, in getCigarStringInformation cigarNumberRead[cigarMatchInfoCount] = int(splitCigar[i]) ValueError: invalid literal for int() with base 10: '92=1'

The dataset (a sam file) was attached.

Thanks a lot for your help.

Max

maxh190 commented 3 years ago

Hi Yu,

Thank you very much for your prompt reply.

Actually, I used a bam file when I used your PennSeq. But it seems there is an error in getCigarStringInformation. As github doesn't allow a user to upload a bam file, I attached the sam file of the testing bam file to you. Can you please convert the attached sam file to a bam file and test the PennSeq for me? Is the Cigar information of the bam/sam file having problems?

When I ran a command "python ~/program/LIQA/LIQA.py -task quantify -refgene ISOFORM_Compatible_Matrix_example -bam BAM_example.sorted.bam -out example.quantify -max_distance 10 -f_weight 1", I got the following error message:

RCAN3 467 reads detected... Traceback (most recent call last): File "LIQA/bin/LRSeq_new.py", line 521, in sumTheta += Alpha[i] + weightF * tmpisolength[i] NameError: name 'tmpisolength' is not defined

Best,

Max

On Mon, Dec 7, 2020 at 4:11 PM huyustats notifications@github.com wrote:

Hi Max,

Thank you so much for interests in using LIQA and PennSeq.

weight_F in LIQA allows user to specify the weight for bias correction in estimation process. We usually recommend user to use 1.

PennSeq only takes bam file as an input. Please try to convert sam to a sorted and indexed bam file in the analysis.

Thank you! Yu

To Whom It May Concern,

When trying to quantify isoform expression for a dataset (attached) by "python $LIQA_PATH/LIQA.py -task quantify -refgene $refgene_PATH/chr1.refgene -bam $data_PATH/test1.sort.bam -out test1.isoform -max_distance 10", I got the following error: **LIQA.py -task quantify -refgene -bam -out

-max_distance -f_weight ** Can you please let me know how to prepare for a "weight of F function"? When trying to generate isoform relative abundances for a dataset (attached) using PennSeq, I got the following error: test1.aln.sam.txt using PennSeq, I got the following error: Traceback (most recent call last): File "PennSeq/PennSeq.py", line 289, in test1.aln.sam.txt cigarMatchRead1, cigarNumberRead1, cigarMatchInfoCount1, cigarNumberInfoCount1 = getCigarStringInformation(readCigar, readName, 1) File "PennSeq/PennSeq.py", line 24, in getCigarStringInformation cigarNumberRead[cigarMatchInfoCount] = int(splitCigar[i]) ValueError: invalid literal for int() with base 10: '92=1' The dataset (a sam file) was attached. Thanks a lot for your help. Max — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub , or unsubscribe .
maxh190 commented 3 years ago

Hi Yu and Kai,

I am trying to use your LIQA to analyze PacBio data. The testing bam file was generated by aligning a PacBio Circular Consensus Sequencing (CCS) fastq file against a macaque genome reference using minimap2. As github doesn't allow a user to upload a bam file, I sent its sam file to you before. Can you please help me solve the problems that I mentioned in last email?

Please let me know if I need to provide other information.

Best, Max

huyustats commented 3 years ago

Hi Yu and Kai,

I am trying to use your LIQA to analyze PacBio data. The testing bam file was generated by aligning a PacBio Circular Consensus Sequencing (CCS) fastq file against a macaque genome reference using minimap2. As github doesn't allow a user to upload a bam file, I sent its sam file to you before. Can you please help me solve the problems that I mentioned in last email?

Please let me know if I need to provide other information.

Best, Max

Hi Max, Thank you for your comments! Please try the updated version of LIQA. The PennSeq on SourceForge is an old version. I will update it ASAP. Please use this version of PennSeq to see if it works. PennSeq.zip

Thanks, Yu

maxh190 commented 3 years ago

Hi Yu,

The version of your PennSeq is working for my PacBio data without any problems. Thank you very much for your help!!

When I ran the "Step 3: Detect differential splicing gene/isoform between conditions" of the LIQA by a command "python3 ~/program/LIQA/LIQA.py -task diff -ref $REF_PATH/chr1.refFile -est TESTING.relative_abundance", I got the following error: Argument '-r' is invalid!

So you may need to change the following line from myCommand = "python " + fileAbsPath + "/bin/Diff.py -r " + refFile + " -est " + estFile to myCommand = "python " + fileAbsPath + "/bin/Diff.py -ref " + refFile + " -est " + estFile in LIQA.py.

Then, I got the following messages: Isoform compatible matrix conversion: Processing... Traceback (most recent call last): File "LIQA/bin/bin/getCompatibleMatrix.py", line 114, in ISO_INDEX_keys_sorted.sort() AttributeError: 'dict_keys' object has no attribute 'sort' Isoform compatible matrix conversion: Done! Exon inclusion levels calculation: Processing... Traceback (most recent call last): File "LIQA/bin/bin/getExonIncLvl.py", line 89, in geneToConditions[gene] = geneToConditions[gene].rstrip(",") AttributeError: 'collections.defaultdict' object has no attribute 'rstrip' Exon inclusion levels calculation: Done! DAS testing: Start to process... Error in read.table(input.file[1]) : no lines available in input Execution halted DAS testing: Done! Summarizing... Traceback (most recent call last): File "LIQA/bin/bin/summarize.py", line 80, in esttmp = isoEsts[gene][isoform][str(cdt)].rstrip(",") AttributeError: 'collections.defaultdict' object has no attribute 'rstrip' Done!

BTW, the ".relative_abundance" file generated by PennSeq only has 4 columns while you described the file has five columns on the LIQA website. It seemed the fifth column [Column 5: condition group (1 or 2)] of the ".relative_abundance" file generated by PennSeq was missing.

Could you let me know why I got the above errors?

Thanks a lot, Max

maxh190 commented 3 years ago

Hi Yu,

Besides the questions I asked in my previous email, I have one more question regarding the relative abundance of isoforms: I found both LIQA and PennSeq could generate isoform relative abundances even though the formats of the two files are different. Do I have to use the one generated by PennSeq? As I mentioned in my previous email, there is only four columns (rather than five columns that you described in the LIQA website) in the file.

Thanks, Max

maxh190 commented 3 years ago

Hi Yu and Kai,

Hope you all had a wonderful holiday season. Can you please help me solve the problems that I addressed in my previous two messages?

Thanks a lot, Max

huyustats commented 3 years ago

Hi Yu and Kai,

Hope you all had a wonderful holiday season. Can you please help me solve the problems that I addressed in my previous two messages?

Thanks a lot, Max Hi Max,

Yes, LIQA and PennSeq both output isoform expression in similar formats. However, the difference is that LIQA is designed for long read while PennSeq is for short-read. For DAS detection error, could you send the input data (isoform estimates files) you used to email huyu999999@gmail.com? I will test it for you.

Thanks!