COMBINE-lab / minnow

10 stars 2 forks source link

std::invalid_argument: stoi: no conversion #10

Closed Acribbs closed 4 years ago

Acribbs commented 4 years ago

Hi,

I have compiled the latest code from GitHub and have the following errors, while running splatter-mode. Any help would be much appreciated thanks.

../../../minnow/build/src/minnow simulate -i . -o output.dir/ -r ../../data/human_transcriptome.fasta -w ../../data/737K-august-2016.txt --splatter-mode --g2t ../../data/human_t2g.tsv --PCR 5  -e 0.001 -p 2 --dbg --gfa ../../data/human_transcriptome_debruijn.gfa 
Input directory  .
Reference Fasta ../../data/human_transcriptome.fasta
Number of PCR cycles 5
Erorr rate 0.001
Numeber of threads 2
[2020-06-02 21:37:18.385] [minnow-Log] [info] Reading reference sequences ...
replaced 4 non-ACGT nucleotides with random nucleotides
Transcript file is read
[2020-06-02 21:37:20.053] [minnow-Log] [info] Reference sequence is loaded ...
Skipped 3016 transcripts because either short or not present in reference 
[2020-06-02 21:37:20.444] [minnow-Log] [info] Number of genes in the txp2gene file: 55327
[2020-06-02 21:37:20.445] [minnow-Log] [info] Parsing ./quants_mat_cols.txt
=======================Reading Splatter Matrix=====================
[2020-06-02 21:37:20.489] [minnow-Log] [info] 54958 cells are present 
[2020-06-02 21:37:20.489] [minnow-Log] [info] Start parsing Splatter output
[2020-06-02 21:37:20.489] [minnow-Log] [info] Parsing ./quants_mat_rows.txt
libc++abi.dylib: terminating with uncaught exception of type std::invalid_argument: stoi: no conversion
Abort trap: 6
hiraksarkar commented 4 years ago

Hi Adam,

Thanks for giving minnow a try. Few questions,

  1. Are you running on the refactor (current default) branch?
  2. This seems like an error with the files itself, Can you please share the files in the current input directory. (given with -i option in .) (there should be at least 3 files that are required) I would try to reproduce the error. Thanks!
Acribbs commented 4 years ago

Many thanks for your quick response.

  1. Yes I have compiled from refactor branch.
  2. I have added files here: https://drive.google.com/drive/folders/1IiOV3sm4P8uZD2-2ReRFegAYRZ79jppj?usp=sharing
hiraksarkar commented 4 years ago

Dear @Acribbs , Thanks for linking the files, there are a few issues,

  1. Since you are not running minnow in alevin-mode it is running by default in splatter-mode which means that it will assume the rows to be the set of genes (quants_mat_rows.txt) and the set of cells to be columns (quants_mat_cols.txt).
  2. I saw that you used the gencode gene names, if you don't want minnow to assign you gene names and use them from quants_mat_rows.txt, please also use the --custom flag. Finally and most importantly,
  3. The quants_mat.csv file does not tally with the other two filew for e.g, here is a brief snapshot of the file line numbers etc,
    ➜  build_refactor git:(refactor) ✗ cat adam_cribbs_data/minnow/quants_mat.csv | cut -d, -f3 | sort| uniq | wc -l 
    132
    ➜  build_refactor git:(refactor) ✗ wc -l adam_cribbs_data/minnow/quants_mat.csv
    8001 adam_cribbs_data/minnow/quants_mat.csv
    ➜  build_refactor git:(refactor) ✗ wc -l adam_cribbs_data/minnow/quants_mat_cols.txt
    1000 adam_cribbs_data/minnow/quants_mat_cols.txt
    ➜  build_refactor git:(refactor) ✗ wc -l adam_cribbs_data/minnow/quants_mat_rows.txt
    54958 adam_cribbs_data/minnow/quants_mat_rows.txt

    Which suggests you specify the matrix to be 54958 x 8001 , but the actual csv file has 8001 x 132 dimension, which is a mismatch and causing the parsing error.

Additionally,

  1. By looking at the content of the csv file I see the columns and the rows contain the row ids and column ids, as gene_1 and col_1, which minnow can't parse, in short minnow assumes the csv file to have only the integer/float values.

I believe once these are fixed it would run. Feel free to contact me if you have more issues. I would keep the thread open.

Acribbs commented 4 years ago

Many thanks for your detailed response and indeed you are correct regarding the matrix. I will get back to you once I have fixed these errors.

Acribbs commented 4 years ago

Many thanks with the initial problem, that seems to be fixed. However, I have a separate issue that im not sure why its being invoked. --countProb file is missing, is there an example of what this file should be as I can't seem to see it documented. Again, thanks for your help.

~/Documents/minnow/build/src/minnow simulate -i . -o output.dir/ -r ../../data/human_transcriptome.fasta -w ../../data/737K-august-2016.txt --splatter-mode --g2t ../../data/human_t2g.tsv --PCR 5  -e 0.001 -p 2 --dbg --gfa ../../data/human_transcriptome_debruijn.gfa --custom
Input directory  .
Reference Fasta ../../data/human_transcriptome.fasta
Number of PCR cycles 5
Erorr rate 0.001
Numeber of threads 2
[2020-06-03 13:20:16.987] [minnow-Log] [info] Reading reference sequences ...
replaced 4 non-ACGT nucleotides with random nucleotides
Transcript file is read
[2020-06-03 13:20:18.546] [minnow-Log] [info] Reference sequence is loaded ...
Skipped 3016 transcripts because either short or not present in reference 
[2020-06-03 13:20:18.909] [minnow-Log] [info] Number of genes in the txp2gene file: 55327
[2020-06-03 13:20:18.909] [minnow-Log] [info] Parsing ./quants_mat_cols.txt
=======================Reading Splatter Matrix=====================
[2020-06-03 13:20:18.910] [minnow-Log] [info] 140 cells are present 
[2020-06-03 13:20:18.910] [minnow-Log] [info] Start parsing Splatter output
[2020-06-03 13:20:18.910] [minnow-Log] [info] Parsing ./quants_mat_rows.txt
In Splatter: Number of genes processed : 8000==================Done Parsing Splatter Matrix==================
[2020-06-03 13:20:19.012] [minnow-Log] [info] Splatter matrix is read, with dimension 140 x 8000

 !!!!!!!!!!!!!!!!!! IN DBG MODE !!!!!!!!!!!!!!!!!!!!!!!
Start loading segments... 
Saw 815590 contigs in total, unitigMap.size(): 619336
Max contig id 4683369
Starting to load paths 
Overlap size 101
Done with GFA 
Equivalece class size 619336    trSegmentMap size 0 transcript map size 197385
[DEBUG]-----0
Done Filtering 
Equivalece class size 513113    trSegmentMap size 196254    transcript map size 197385
[2020-06-03 13:20:30.319] [minnow-Log] [info] The size of the gene id pool 54809
In Splatter: Number of genes processed : 8000[2020-06-03 13:20:30.351] [minnow-Log] [warning] Skipping 8000 genes, gene pool size of de-Bruijn graph 54809
[2020-06-03 13:20:30.355] [minnow-Log] [info] Truncated the matrix to dimension 140 x 0
RSPD ::: 
[2020-06-03 13:20:30.355] [minnow-Log] [warning] counting hard coded count prob file
[2020-06-03 13:20:30.355] [minnow-Log] [error] Invoked with DBG mode but --countProb file is not present
[2020-06-03 13:20:30.355] [minnow-Log] [error] Add --countProb option with the file countProb_pbmc_4k.txt
hiraksarkar commented 4 years ago

Hi, So if the --dbg mode is used, it wants to replicate the multi-mapping histogram from a real sample, to mimic the behavior, For human one such file is already provided, which is constructed on a PBMC sample and linked here https://github.com/COMBINE-lab/minnow/blob/refactor/data/hg/countProb_pbmc_4k.txt So you can run the experiment with an added option --countProb countProb_pbmc_4k.txt

Acribbs commented 4 years ago

Great, this is exactly the same dataset that I want to use too.

However, when specifying this extra argument I get the following error:

 !!!!!!!!!!!!!!!!!! IN DBG MODE !!!!!!!!!!!!!!!!!!!!!!!
Start loading segments... 
Saw 815590 contigs in total, unitigMap.size(): 619336
Max contig id 4683369
Starting to load paths 
Overlap size 101
Done with GFA 
Equivalece class size 619336    trSegmentMap size 0 transcript map size 197385
[DEBUG]-----0
Done Filtering 
Equivalece class size 513113    trSegmentMap size 196254    transcript map size 197385
[2020-06-03 13:32:08.589] [minnow-Log] [info] The size of the gene id pool 54809
In Splatter: Number of genes processed : 8000RSPD ::: 
libc++abi.dylib: terminating with uncaught exception of type std::invalid_argument: stoul: no conversion
Abort trap: 6
Acribbs commented 4 years ago

sorry didn't post full trace

~/Documents/minnow/build/src/minnow simulate -i . -o output.dir/ -r ../../data/human_transcriptome.fasta -w ../../data/737K-august-2016.txt --splatter-mode --g2t ../../data/human_t2g.tsv --PCR 5  -e 0.001 -p 2 --dbg --gfa ../../data/human_transcriptome_debruijn.gfa --countProb ../../data/countProb_pbmc_4k.txt
Input directory  .
Reference Fasta ../../data/human_transcriptome.fasta
Number of PCR cycles 5
Erorr rate 0.001
Numeber of threads 2
[2020-06-03 13:31:55.512] [minnow-Log] [info] Reading reference sequences ...
replaced 4 non-ACGT nucleotides with random nucleotides
Transcript file is read
[2020-06-03 13:31:57.008] [minnow-Log] [info] Reference sequence is loaded ...
Skipped 3016 transcripts because either short or not present in reference 
[2020-06-03 13:31:57.339] [minnow-Log] [info] Number of genes in the txp2gene file: 55327
[2020-06-03 13:31:57.339] [minnow-Log] [info] Parsing ./quants_mat_cols.txt
=======================Reading Splatter Matrix=====================
[2020-06-03 13:31:57.340] [minnow-Log] [info] 140 cells are present 
[2020-06-03 13:31:57.340] [minnow-Log] [info] Start parsing Splatter output
[2020-06-03 13:31:57.340] [minnow-Log] [info] Parsing ./quants_mat_rows.txt
In Splatter: Number of genes processed : 8000==================Done Parsing Splatter Matrix==================
[2020-06-03 13:31:57.444] [minnow-Log] [info] Splatter matrix is read, with dimension 140 x 8000

 !!!!!!!!!!!!!!!!!! IN DBG MODE !!!!!!!!!!!!!!!!!!!!!!!
Start loading segments... 
Saw 815590 contigs in total, unitigMap.size(): 619336
Max contig id 4683369
Starting to load paths 
Overlap size 101
Done with GFA 
Equivalece class size 619336    trSegmentMap size 0 transcript map size 197385
[DEBUG]-----0
Done Filtering 
Equivalece class size 513113    trSegmentMap size 196254    transcript map size 197385
[2020-06-03 13:32:08.589] [minnow-Log] [info] The size of the gene id pool 54809
In Splatter: Number of genes processed : 8000RSPD ::: 
libc++abi.dylib: terminating with uncaught exception of type std::invalid_argument: stoul: no conversion
Abort trap: 6
Acribbs commented 4 years ago

sorry scratch that last issue for the moment, I think there was a problem with downloading the data (Problems with working from home)

hiraksarkar commented 4 years ago

Can you reupload the files, something in the files is not still right, because it's skipping all 8000 genes, one reason could be a difference in gene names, the gene names in the rows file does not seem to have .s, can you make sure that is the same used ../../data/human_t2g.tsv. If not then it won't find the same gene names.

In any case if you can upload the files, I can take a much deeper look around the weekend, unfortunately, would be a bit occupied until then.

Acribbs commented 4 years ago

It seems to be running, I will let you know if I have any further issues. Thanks for your help.

Acribbs commented 4 years ago

The countProb_pbmc_4k.txt was malformed (maybe patchy internet) but looks good now.

Acribbs commented 4 years ago

Will close as the software now runs. I haven't tested output but if I come into another issue I will open a separate issue. Thanks very much for your help and congratulations on your very useful piece of software.