MolSSI / QCFractal

A distributed compute and database platform for quantum chemistry.
https://molssi.github.io/QCFractal/
BSD 3-Clause "New" or "Revised" License
148 stars 48 forks source link

Can't serialize to JSON #591

Closed mattwthompson closed 1 year ago

mattwthompson commented 4 years ago

Describe the bug I wanted to save a collection to disk in order to avoid needing to downloads a large dataset every time I ran a test or re-started a notebook.

To Reproduce

import json

import qcportal

client = ptl.FractalClient()
ds = client.get_collection('OptimizationDataset', 'OpenFF Optimization Set 1')
ds.to_json(filename='data.json')

raises TypeError: Object of type set is not JSON serializable

Expected behavior I expected to be able to save this out to JSON

Additional context There is data in the collection object:

image

@loriab suggested on Slack that a set may have snuck in somewhere. This is probably a terrible collection to debug on since it includes something like 20,000 records.

bennybp commented 4 years ago

Ok I see the problem. The history key is a set

dotsdl commented 4 years ago

Diving into this one now.

dotsdl commented 4 years ago

@bennybp I see you made a commit that addressed history being a set in a branch on your fork. Are you planning to merge this? Otherwise I can make the change in a PR here.

bennybp commented 4 years ago

I started to fix it in that PR but abandoned it. It is a little more involved than just changing it to a list (there are some places that use the set functionality that have to also be modified.

Looking at the database, the database stores this info as JSON. Not entirely sure where this gets converted from a set to a list on the backend...

dotsdl commented 4 years ago

Ah cool, thank you for that clarification. Working on a solution that doesn't fail in other places.

mattwthompson commented 2 years ago

I ran into this again today. For my provenance, the quickest solution is to just pop the history. There's probably a way to map it onto a list but I don't think I need it for my use use case and just data['history']= list(data['history']) did not completely work - it was happy to write to disk but could not be read back. I didn't look further into why.


import json

import qcportal

client = qcportal.FractalClient(verify=False)

dataset = client.get_collection(
    "OptimizationDataset",
    "OpenFF Iodine Chemistry Optimization Dataset v1.0",
)

with open("dataset.json", "w") as file:

    data = dataset.to_json()
    data.pop("history")

    json.dump(data, file)

with open("dataset.json", "r") as file:
    data = json.load(file)
bennybp commented 1 year ago

Superseded by #740 for v0.50