Closed Acribbs closed 4 years ago
Hi Adam,
Thanks for giving minnow a try. Few questions,
-i
option in .
) (there should be at least 3 files that are required)
I would try to reproduce the error.
Thanks!Many thanks for your quick response.
Dear @Acribbs , Thanks for linking the files, there are a few issues,
alevin-mode
it is running by default in splatter-mode
which means that it will assume the rows to be the set of genes (quants_mat_rows.txt
) and the set of cells to be columns (quants_mat_cols.txt
).quants_mat_rows.txt
, please also use the --custom
flag.
Finally and most importantly,quants_mat.csv
file does not tally with the other two filew for e.g,
here is a brief snapshot of the file line numbers etc,
➜ build_refactor git:(refactor) ✗ cat adam_cribbs_data/minnow/quants_mat.csv | cut -d, -f3 | sort| uniq | wc -l
132
➜ build_refactor git:(refactor) ✗ wc -l adam_cribbs_data/minnow/quants_mat.csv
8001 adam_cribbs_data/minnow/quants_mat.csv
➜ build_refactor git:(refactor) ✗ wc -l adam_cribbs_data/minnow/quants_mat_cols.txt
1000 adam_cribbs_data/minnow/quants_mat_cols.txt
➜ build_refactor git:(refactor) ✗ wc -l adam_cribbs_data/minnow/quants_mat_rows.txt
54958 adam_cribbs_data/minnow/quants_mat_rows.txt
Which suggests you specify the matrix to be 54958 x 8001
, but the actual csv file has 8001 x 132
dimension, which is a mismatch and causing the parsing error.
Additionally,
gene_1
and col_1
, which minnow can't parse, in short minnow assumes the csv file to have only the integer/float values.I believe once these are fixed it would run. Feel free to contact me if you have more issues. I would keep the thread open.
Many thanks for your detailed response and indeed you are correct regarding the matrix. I will get back to you once I have fixed these errors.
Many thanks with the initial problem, that seems to be fixed. However, I have a separate issue that im not sure why its being invoked. --countProb file is missing, is there an example of what this file should be as I can't seem to see it documented. Again, thanks for your help.
~/Documents/minnow/build/src/minnow simulate -i . -o output.dir/ -r ../../data/human_transcriptome.fasta -w ../../data/737K-august-2016.txt --splatter-mode --g2t ../../data/human_t2g.tsv --PCR 5 -e 0.001 -p 2 --dbg --gfa ../../data/human_transcriptome_debruijn.gfa --custom
Input directory .
Reference Fasta ../../data/human_transcriptome.fasta
Number of PCR cycles 5
Erorr rate 0.001
Numeber of threads 2
[2020-06-03 13:20:16.987] [minnow-Log] [info] Reading reference sequences ...
replaced 4 non-ACGT nucleotides with random nucleotides
Transcript file is read
[2020-06-03 13:20:18.546] [minnow-Log] [info] Reference sequence is loaded ...
Skipped 3016 transcripts because either short or not present in reference
[2020-06-03 13:20:18.909] [minnow-Log] [info] Number of genes in the txp2gene file: 55327
[2020-06-03 13:20:18.909] [minnow-Log] [info] Parsing ./quants_mat_cols.txt
=======================Reading Splatter Matrix=====================
[2020-06-03 13:20:18.910] [minnow-Log] [info] 140 cells are present
[2020-06-03 13:20:18.910] [minnow-Log] [info] Start parsing Splatter output
[2020-06-03 13:20:18.910] [minnow-Log] [info] Parsing ./quants_mat_rows.txt
In Splatter: Number of genes processed : 8000==================Done Parsing Splatter Matrix==================
[2020-06-03 13:20:19.012] [minnow-Log] [info] Splatter matrix is read, with dimension 140 x 8000
!!!!!!!!!!!!!!!!!! IN DBG MODE !!!!!!!!!!!!!!!!!!!!!!!
Start loading segments...
Saw 815590 contigs in total, unitigMap.size(): 619336
Max contig id 4683369
Starting to load paths
Overlap size 101
Done with GFA
Equivalece class size 619336 trSegmentMap size 0 transcript map size 197385
[DEBUG]-----0
Done Filtering
Equivalece class size 513113 trSegmentMap size 196254 transcript map size 197385
[2020-06-03 13:20:30.319] [minnow-Log] [info] The size of the gene id pool 54809
In Splatter: Number of genes processed : 8000[2020-06-03 13:20:30.351] [minnow-Log] [warning] Skipping 8000 genes, gene pool size of de-Bruijn graph 54809
[2020-06-03 13:20:30.355] [minnow-Log] [info] Truncated the matrix to dimension 140 x 0
RSPD :::
[2020-06-03 13:20:30.355] [minnow-Log] [warning] counting hard coded count prob file
[2020-06-03 13:20:30.355] [minnow-Log] [error] Invoked with DBG mode but --countProb file is not present
[2020-06-03 13:20:30.355] [minnow-Log] [error] Add --countProb option with the file countProb_pbmc_4k.txt
Hi,
So if the --dbg
mode is used, it wants to replicate the multi-mapping histogram from a real sample, to mimic the behavior,
For human one such file is already provided, which is constructed on a PBMC sample and linked here https://github.com/COMBINE-lab/minnow/blob/refactor/data/hg/countProb_pbmc_4k.txt
So you can run the experiment with an added option
--countProb countProb_pbmc_4k.txt
Great, this is exactly the same dataset that I want to use too.
However, when specifying this extra argument I get the following error:
!!!!!!!!!!!!!!!!!! IN DBG MODE !!!!!!!!!!!!!!!!!!!!!!!
Start loading segments...
Saw 815590 contigs in total, unitigMap.size(): 619336
Max contig id 4683369
Starting to load paths
Overlap size 101
Done with GFA
Equivalece class size 619336 trSegmentMap size 0 transcript map size 197385
[DEBUG]-----0
Done Filtering
Equivalece class size 513113 trSegmentMap size 196254 transcript map size 197385
[2020-06-03 13:32:08.589] [minnow-Log] [info] The size of the gene id pool 54809
In Splatter: Number of genes processed : 8000RSPD :::
libc++abi.dylib: terminating with uncaught exception of type std::invalid_argument: stoul: no conversion
Abort trap: 6
sorry didn't post full trace
~/Documents/minnow/build/src/minnow simulate -i . -o output.dir/ -r ../../data/human_transcriptome.fasta -w ../../data/737K-august-2016.txt --splatter-mode --g2t ../../data/human_t2g.tsv --PCR 5 -e 0.001 -p 2 --dbg --gfa ../../data/human_transcriptome_debruijn.gfa --countProb ../../data/countProb_pbmc_4k.txt
Input directory .
Reference Fasta ../../data/human_transcriptome.fasta
Number of PCR cycles 5
Erorr rate 0.001
Numeber of threads 2
[2020-06-03 13:31:55.512] [minnow-Log] [info] Reading reference sequences ...
replaced 4 non-ACGT nucleotides with random nucleotides
Transcript file is read
[2020-06-03 13:31:57.008] [minnow-Log] [info] Reference sequence is loaded ...
Skipped 3016 transcripts because either short or not present in reference
[2020-06-03 13:31:57.339] [minnow-Log] [info] Number of genes in the txp2gene file: 55327
[2020-06-03 13:31:57.339] [minnow-Log] [info] Parsing ./quants_mat_cols.txt
=======================Reading Splatter Matrix=====================
[2020-06-03 13:31:57.340] [minnow-Log] [info] 140 cells are present
[2020-06-03 13:31:57.340] [minnow-Log] [info] Start parsing Splatter output
[2020-06-03 13:31:57.340] [minnow-Log] [info] Parsing ./quants_mat_rows.txt
In Splatter: Number of genes processed : 8000==================Done Parsing Splatter Matrix==================
[2020-06-03 13:31:57.444] [minnow-Log] [info] Splatter matrix is read, with dimension 140 x 8000
!!!!!!!!!!!!!!!!!! IN DBG MODE !!!!!!!!!!!!!!!!!!!!!!!
Start loading segments...
Saw 815590 contigs in total, unitigMap.size(): 619336
Max contig id 4683369
Starting to load paths
Overlap size 101
Done with GFA
Equivalece class size 619336 trSegmentMap size 0 transcript map size 197385
[DEBUG]-----0
Done Filtering
Equivalece class size 513113 trSegmentMap size 196254 transcript map size 197385
[2020-06-03 13:32:08.589] [minnow-Log] [info] The size of the gene id pool 54809
In Splatter: Number of genes processed : 8000RSPD :::
libc++abi.dylib: terminating with uncaught exception of type std::invalid_argument: stoul: no conversion
Abort trap: 6
sorry scratch that last issue for the moment, I think there was a problem with downloading the data (Problems with working from home)
Can you reupload the files, something in the files is not still right, because it's skipping all 8000 genes, one reason could be a difference in gene names, the gene names in the rows file does not seem to have .
s, can you make sure that is the same used ../../data/human_t2g.tsv
. If not then it won't find the same gene names.
In any case if you can upload the files, I can take a much deeper look around the weekend, unfortunately, would be a bit occupied until then.
It seems to be running, I will let you know if I have any further issues. Thanks for your help.
The countProb_pbmc_4k.txt was malformed (maybe patchy internet) but looks good now.
Will close as the software now runs. I haven't tested output but if I come into another issue I will open a separate issue. Thanks very much for your help and congratulations on your very useful piece of software.
Hi,
I have compiled the latest code from GitHub and have the following errors, while running splatter-mode. Any help would be much appreciated thanks.