Closed mattwthompson closed 1 year ago
Ok I see the problem. The history
key is a set
Diving into this one now.
@bennybp I see you made a commit that addressed history
being a set
in a branch on your fork. Are you planning to merge this? Otherwise I can make the change in a PR here.
I started to fix it in that PR but abandoned it. It is a little more involved than just changing it to a list (there are some places that use the set functionality that have to also be modified.
Looking at the database, the database stores this info as JSON. Not entirely sure where this gets converted from a set to a list on the backend...
Ah cool, thank you for that clarification. Working on a solution that doesn't fail in other places.
I ran into this again today. For my provenance, the quickest solution is to just pop the history. There's probably a way to map it onto a list
but I don't think I need it for my use use case and just data['history']= list(data['history'])
did not completely work - it was happy to write to disk but could not be read back. I didn't look further into why.
import json
import qcportal
client = qcportal.FractalClient(verify=False)
dataset = client.get_collection(
"OptimizationDataset",
"OpenFF Iodine Chemistry Optimization Dataset v1.0",
)
with open("dataset.json", "w") as file:
data = dataset.to_json()
data.pop("history")
json.dump(data, file)
with open("dataset.json", "r") as file:
data = json.load(file)
Superseded by #740 for v0.50
Describe the bug I wanted to save a collection to disk in order to avoid needing to downloads a large dataset every time I ran a test or re-started a notebook.
To Reproduce
raises
TypeError: Object of type set is not JSON serializable
Expected behavior I expected to be able to save this out to JSON
Additional context There is data in the collection object:
@loriab suggested on Slack that a
set
may have snuck in somewhere. This is probably a terrible collection to debug on since it includes something like 20,000 records.