biolink / kgx

KGX is a Python library for exchanging Knowledge Graphs
https://kgx.readthedocs.io
BSD 3-Clause "New" or "Revised" License
114 stars 26 forks source link

add "parquet source" mapped to "GraphSource" to support parquet sink support #490

Closed sierra-moxon closed 5 months ago

sierra-moxon commented 5 months ago
caufieldjh commented 5 months ago

Thanks @sierra-moxon !

caufieldjh commented 5 months ago

Still throwing an error from prepare_output_args() in cli_utils.py - I think I have a fix

caufieldjh commented 5 months ago

False alarm - I was running the incorrect kgx version again.

But I'll take this chance to add material on parquet sink to the docs.

justaddcoffee commented 5 months ago

I see parquet as an available output option now as expected:

$ poetry run kgx transform --help 
Usage: kgx transform [OPTIONS] [INPUTS]...

  Transform a Knowledge Graph from one serialization form to another.

Options:
  -i, --input-format TEXT         The input format. Can be one of ('tsv',
                                  'csv', 'graph', 'json', 'jsonl', 'obojson',
                                  'obo-json', 'trapi-json', 'neo4j', 'nt',
                                  'owl', 'sssom', 'parquet')
  -c, --input-compression TEXT    The input compression type
  -o, --output PATH               Output
  -f, --output-format TEXT        The output format. Can be one of ('tsv',
                                  'csv', 'graph', 'json', 'jsonl', 'obojson',
                                  'obo-json', 'trapi-json', 'neo4j', 'nt',
                                  'owl', 'sssom', 'parquet')
[snip]

Should this be working now or no?

$ poetry run kgx transform -f parquet -o tempout tests/resources/rdf/test1.nt 
[KGX][__init__.py][   transform_wrapper] ERROR: kgx.transform error: Type None not yet supported
caufieldjh commented 5 months ago

It appears to be working for me:

~/kgx$ poetry run kgx transform -f parquet -o tempout tests/resources/rdf/test1.nt -i nt
[KGX][rdf_source.py][               parse] INFO: Done parsing tests/resources/rdf/test1.nt

The distinction being that you'll still have to specify the input format, too

justaddcoffee commented 5 months ago

Excellent! Thanks @caufieldjh works for me too:

$ poetry run kgx transform -f parquet -o tempout tests/resources/rdf/test1.nt -i nt
[KGX][rdf_source.py][               parse] INFO: Done parsing tests/resources/rdf/test1.nt