COMBINE-lab / salmon

🐟 🍣 🍱 Highly-accurate & wicked fast transcript-level quantification from RNA-seq reads using selective alignment
https://combine-lab.github.io/salmon
GNU General Public License v3.0
780 stars 165 forks source link

Alevin: Problem with PyPI vpolo ["Reading Alevin’s bfh (big freaking hash) file" section of Alevin tutorial] #650

Open taeyon998 opened 3 years ago

taeyon998 commented 3 years ago

Is the bug primarily related to salmon (bulk mode) or alevin (single-cell mode)? Alevin single-cell mode.

Describe the bug Hi, I bumped into a problem following this tutorial https://combine-lab.github.io/alevin-tutorial/2018/output-format/ . It's the "Reading Alevin’s bfh (big freaking hash) file" section, where there are just 2 lines I should run. The problem is on the second line, "parser.read_bfh()" function.

It throws me a pandas.errors.ParserError: Error tokenizing data. C error: Expected 1 fields in line 110446, saw 20

I tried diagnosing the problem and looked into the input bfh.txt file. The problem wasn't just line 110447, but many other lines that had more than 1 field. So the real question breaks down into: should the bfh.txt file have only 1 field per row (line)? If this is the case, then the input bfh.txt file is problematic. If not, then the parser function is problematic, as it should account for more than 1 field.

To Reproduce Steps and data to reproduce the behavior:

Specifically, please provide at least the following information:

Expected behavior The bfh.txt file should be parsed. In other words, the line parser.read_bfh("<PATH to alevin output folder>", "<PATH to t2g file>") should work without error, according to the tutorial below: https://combine-lab.github.io/alevin-tutorial/2018/output-format/

Screenshots If applicable, add screenshots or terminal output to help explain your problem. Screenshot of error: image

Desktop (please complete the following information):

Additional context

taeyon998 commented 3 years ago

I solved my own problem. It was my mistake, where I put the PATH for bfh.txt for the 2nd argument in parser.read_bfh() function. I was supposed to put the PATH to "txp2gene.tsv" instead.

Solution: Provide PATH to "txp2gene.tsv" for 2nd argument of parser.read_bfh().