Global-Chem / global-chem

A Knowledge Graph of Common Chemical Names to their Molecular Definition
https://globalchemistry.org/
Mozilla Public License 2.0
156 stars 21 forks source link

SS-185: Turn Global-chem into a biobrick #317

Open Sulstice opened 4 months ago

Sulstice commented 4 months ago

We are going to be taking Global-Chem and turning into a biobrick:

I opened up the issue here and here is the code for it: https://github.com/Global-Chem/global-chem-brick. The first thing to do is create a python file that converts the CSV from global-chem into a parquet file.

Should be a one-liner. Make the directory similar to the one labeled here: https://github.com/biobricks-ai/drugbank-open.

Screenshot 2024-07-19 at 3 08 23 PM

And place your script in the global-chem brick repository: https://github.com/Global-Chem/global-chem-brick

Sulstice commented 3 months ago

@Nickspizza001 So let's start slow with this one.

This first thing to do would be to take global-chem tsv and convert it into a parquet file.

  1. Create a directory called transformers.
  2. In the directory write a python script that takes in the tsv file and converts it to parquet file. Tell me why parquet is used.

Open a pull request and show me.

Nickspizza001 commented 3 months ago

https://github.com/Global-Chem/global-chem-brick/tree/dami

Parquet is used because it compresses large files better than others