Mvp 1306 update neo node labelling generation script to ensure no duplicates

cellannotation / cap-pipeline-config

Building ontology pipeline configurations for the Cell Annotation Platform

3 stars 0 forks source link

Mvp 1306 update neo node labelling generation script to ensure no duplicates #48

Closed ubyndr closed 2 years ago

ubyndr commented 2 years ago

Fixes #41 , Used namedtuple to prevent duplicates in neo_node_labelling section.

dosumis commented 2 years ago

RE: namedtuple We're doing this as opposed to a dictionary to ensure that the keys are strings?

It's hashable (as long as its component parts are) so can be used with set() to remove duplicates.

ubyndr commented 2 years ago

I've tried follow this guide to create the python package, https://packaging.python.org/en/latest/tutorials/packaging-projects/. All the names are horrible I guess, open for suggestions. Also I have uploaded the package src code to here for testing purposes, https://github.com/ubyndr/config_utils, also open for suggestions where to keep it. @dosumis can you please review it again?

evanbiederstedt commented 2 years ago

I think it's standard to put the argparse section at the top of the script, or within a main() function to be run.

That is, this part:

parser = argparse.ArgumentParser(description = 'set destination YAML file for query output')

parser.add_argument('-f', '--file', default = '../config/prod/neo4j2owl-config.yaml', help = '''
    Use this option to indicate destination file for organ cell DL queries and semantic labels. By default, output
    is sent to a file named neo4j2owl-config.yaml.
    ''')

args = parser.parse_args()

file_name = args.file

That said, it's not a big deal either way.

ubyndr commented 2 years ago

I have changed the package name and upload it to https://pypi.org/project/cap-pipeline-config-utils/0.0.1/

dosumis commented 2 years ago

Was just expecting local package rather than something on PyPi. Sorry if not clear. Code here looks good though: https://github.com/ubyndr/config_utils/blob/main/src/config_autogenerate_utils/utils.py

If keeping as as package on PyPi, we need to think about a better home for the code. One option is to package with the JSON schema and schema checker. I like the idea of utils including schema update and schema validation.

More general discussion needed on factoring out this pipeline as a product distinct from VFB.

evanbiederstedt commented 2 years ago

Was just expecting local package rather than something on PyPi. Sorry if not clear. Code here looks good though: https://github.com/ubyndr/config_utils/blob/main/src/config_autogenerate_utils/utils.py

If keeping as as package on PyPi, we need to think about a better home for the code. One option is to package with the JSON schema and schema checker. I like the idea of utils including schema update and schema validation.

@ubyndr BTW, we should introduce you to how we internally host packages on GCP. That's probably more appropriate for what's needed here as opposed to PyPi (not that there's anything wrong with PyPi).

@ilguzin is the specialist on this

ubyndr commented 2 years ago

Was just expecting local package rather than something on PyPi. Sorry if not clear. Code here looks good though: https://github.com/ubyndr/config_utils/blob/main/src/config_autogenerate_utils/utils.py

If keeping as as package on PyPi, we need to think about a better home for the code. One option is to package with the JSON schema and schema checker. I like the idea of utils including schema update and schema validation.

More general discussion needed on factoring out this pipeline as a product distinct from VFB.

Can I close this PR as it is, and we can have an other issue for packaging issue/discussion? @dosumis