fhcrc / taxtastic

Create and maintain phylogenetic "reference packages" of biological sequences.
GNU General Public License v3.0
21 stars 10 forks source link

remove dependency on pandas #108

Closed nhoffman closed 6 years ago

nhoffman commented 7 years ago

Pandas is used only in a few places - seems like we could easily reimplement not to require it.

% find taxtastic/ tests/ -name '*.py' | xargs grep pandas
taxtastic/subcommands/update_taxids.py:import pandas
taxtastic/subcommands/update_taxids.py:        rows = pandas.read_csv(args.infile, dtype='str')
taxtastic/subcommands/update_taxids.py:    except pandas.io.common.EmptyDataError as e:
taxtastic/subcommands/update_taxids.py:    merged = pandas.read_sql_table(
taxtastic/subcommands/update_taxids.py:    names = pandas.read_sql_table(
taxtastic/subcommands/update_taxids.py:        ranks = pandas.read_sql_table('ranks', engine, schema=args.schema)
taxtastic/subcommands/update_taxids.py:        nodes = pandas.read_sql_table(
taxtastic/subcommands/count_taxids.py:import pandas
taxtastic/subcommands/count_taxids.py:    lineage = pandas.read_csv(args.taxonomy, dtype=str)
taxtastic/subcommands/count_taxids.py:        seqinfo = pandas.read_csv(args.seq_info, usecols=['tax_id'], dtype=str)
taxtastic/subcommands/count_taxids.py:    counts = pandas.concat(rank_counts)
nhoffman commented 7 years ago

@crosenth - are either of these subcommands used in an existing pipeline? What does update_taxids do? It's a bit difficult to tell from the description.

crosenth commented 7 years ago

You tell me:

https://github.com/fhcrc/taxtastic/blame/d490776a31077fdc1fa1021e38e6058a2b4ee1c3/taxtastic/subcommands/update_taxids.py

Historically the idea is it uses the ncbi merged file to update tax_ids in any comma delimited file with tax_id in the header.

You can remove Pandas from Taxtastic but there are significant advantages to using high performance data structures to process taxonomy data on the filesystem. Perhaps that sort of functionality is more appropriate as part of a separate project..?