biocore / biom-format

The Biological Observation Matrix (BIOM) Format Project
http://biom-format.org
Other
92 stars 95 forks source link

ValueError: column index exceeds matrix dimensions #841

Closed WaterKnight1998 closed 4 years ago

WaterKnight1998 commented 4 years ago

Hi, I am trying to convert a txt file to biom with the following command: biom convert -i table.txt -o table.from_txt_json.biom --table-type="OTU table" --to-json

However, that commando ouputs the next error: ValueError: column index exceeds matrix dimensions.

How can i solve it?

wasade commented 4 years ago

Hi @WaterKnight1998, would it be possible to provide the full error output?

WaterKnight1998 commented 4 years ago

Hi @WaterKnight1998, would it be possible to provide the full error output?

Hi @wasade

biom convert -i dada_table.txt -o table.from_txt_json.biom --table-type="OTU table" --to-json
Traceback (most recent call last):
  File "/prod/apps/conda/3/envs/qiime2-2019.7/bin/biom", line 11, in <module>
    sys.exit(cli())
  File "/prod/apps/conda/3/envs/qiime2-2019.7/lib/python3.6/site-packages/click/core.py", line 764, in __call__
    return self.main(*args, **kwargs)
  File "/prod/apps/conda/3/envs/qiime2-2019.7/lib/python3.6/site-packages/click/core.py", line 717, in main
    rv = self.invoke(ctx)
  File "/prod/apps/conda/3/envs/qiime2-2019.7/lib/python3.6/site-packages/click/core.py", line 1137, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/prod/apps/conda/3/envs/qiime2-2019.7/lib/python3.6/site-packages/click/core.py", line 956, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/prod/apps/conda/3/envs/qiime2-2019.7/lib/python3.6/site-packages/click/core.py", line 555, in invoke
    return callback(*args, **kwargs)
  File "/prod/apps/conda/3/envs/qiime2-2019.7/lib/python3.6/site-packages/biom/cli/table_converter.py", line 114, in convert
    table = load_table(input_fp)
  File "/prod/apps/conda/3/envs/qiime2-2019.7/lib/python3.6/site-packages/biom/parse.py", line 660, in load_table
    table = parse_biom_table(fp)
  File "/prod/apps/conda/3/envs/qiime2-2019.7/lib/python3.6/site-packages/biom/parse.py", line 412, in parse_biom_table
    t = Table.from_tsv(fp, None, None, lambda x: x)
  File "/prod/apps/conda/3/envs/qiime2-2019.7/lib/python3.6/site-packages/biom/table.py", line 4649, in from_tsv
    return Table(data, obs_ids, sample_ids, obs_metadata, sample_metadata)
  File "/prod/apps/conda/3/envs/qiime2-2019.7/lib/python3.6/site-packages/biom/table.py", line 475, in __init__
    shape=shape)
  File "/prod/apps/conda/3/envs/qiime2-2019.7/lib/python3.6/site-packages/biom/table.py", line 630, in _to_sparse
    mat = list_list_to_sparse(values, dtype, shape=shape)
  File "/prod/apps/conda/3/envs/qiime2-2019.7/lib/python3.6/site-packages/biom/table.py", line 4907, in list_list_to_sparse
    dtype=dtype)
  File "/prod/apps/conda/3/envs/qiime2-2019.7/lib/python3.6/site-packages/scipy/sparse/coo.py", line 198, in __init__
    self._check()
  File "/prod/apps/conda/3/envs/qiime2-2019.7/lib/python3.6/site-packages/scipy/sparse/coo.py", line 287, in _check
    raise ValueError('column index exceeds matrix dimensions')
ValueError: column index exceeds matrix dimensions
wasade commented 4 years ago

It looks like there may be too many column headers? If the last column is something like taxonomy, then there are options on convert to use it. I think --header-key and the associated column name should do it

wasade commented 4 years ago

Examples here

WaterKnight1998 commented 4 years ago

It looks like there may be too many column headers? If the last column is something like taxonomy, then there are options on convert to use it. I think --header-key and the associated column name should do it

It is an ASV table (unique sequences) obtained after processing 16S sequences with dada2 algorithm. It contains the counts of each ASV (rows) for each sample (columns) but the taxonomy of each ASV is not contained in any column of this table

wasade commented 4 years ago

Possible to share the table_head.txt file that results from head -n 6 table.txt | cut -f 1,2,3,4,5,6 > table_head.txt?

WaterKnight1998 commented 4 years ago

Possible to share the table_head.txt file that results from head -n 6 table.txt | cut -f 1,2,3,4,5,6 > table_head.txt? @wasade

Without naming the first column:

500     501     502     503     504     505
seq1    217     191     97      254     127
seq2    164     172     107     224     100
seq3    717     1       296     35      24
seq4    11      0       3       16      2
seq5    125     131     49      183     76

Named first column:

Taxa    500     501     502     503     504
seq1    217     191     97      254     127
seq2    164     172     107     224     100
seq3    717     1       296     35      24
seq4    11      0       3       16      2
seq5    125     131     49      183     76
wasade commented 4 years ago

Thanks! The header needs to start with "#OTU ID" rather than "Taxa". I'm not sure if that is the overall issue, but that is an item. See an example of what a TSV that biom accepts below (as produced from a 2.1.0 formatted file):

$ biom convert -i min_sparse_otu_table_hdf5.biom -o min_sparse_otu_table_hdf5.tsv --to-tsv
$ cat min_sparse_otu_table_hdf5.tsv
# Constructed from biom file
#OTU ID Sample1 Sample2 Sample3 Sample4 Sample5 Sample6
GG_OTU_1    0.0 0.0 1.0 0.0 0.0 0.0
GG_OTU_2    5.0 1.0 0.0 2.0 3.0 1.0
GG_OTU_3    0.0 0.0 1.0 4.0 0.0 2.0
GG_OTU_4    2.0 1.0 1.0 0.0 0.0 1.0
GG_OTU_5    0.0 1.0 1.0 0.0 0.0 0.0
wasade commented 4 years ago

I think this issue is resolved