Closed CodyCBakerPhD closed 1 year ago
Python code for parsing the exported temporary (and simplified) .tsv
import json
from pathlib import Path
from pandas import read_table
from pandas.io.json import build_table_schema
table_path = Path("C:/Users/Raven/Downloads/Ecosystem Format Support v3 - Simplified - Ecephys.tsv")
table = read_table(filepath_or_buffer=table_path)
json_serialization = table.T.to_json()
json_table = list(json.loads(json_serialization).values())
json_path = table_path.parent / (table_path.stem + ".json")
with open(file=json_path, mode="w") as io:
io.write(json.dumps(json_table, indent=4))
json_schema = build_table_schema(data=table)
json_schema_path = table_path.parent / (table_path.stem + "_schema.json")
with open(file=json_schema_path, mode="w") as io:
io.write(json.dumps(json_schema, indent=4))
Minor note: The 'schema' is a table schema format not the usual JSON form format; it doesn't seem to capture the optionality of the version column as 'string' or 'null'
Once we get the ecephys version of this polished to both our liking, then we can quickly handle the other modalities in follow-ups
If we want to keep this indexed format, I'd suggest changing the encapsulating object into an array—though I imagine this may be incompatible with the table schema format:
[
{
"Format": "AlphaOmega",
"Versions": null,
"Suffix(es)": ".mpx",
"Example Data": true,
"Neo - Raw IO": true,
"Neo - Tests": true,
"SpikeInterface - Extractor": true,
"SpikeInterface - Tests": true,
"NeuroConv - Interface": true,
"NeuroConv - Tests": true,
"NWB GUIDE": false
}
]
Otherwise for further simplification of the parsing, I was thinking we could further simplify the structure and unnecessary data specification:
{
"AlphaOmega": {
"Suffix(es)": ".mpx",
"Example Data": true
"Neo": {
"RawIO": true,
"Tests": true
},
"SpikeInterface": {
"Extractor": true,
"Tests": true
},
"NeuroConv": {
"Interface": true,
"Tests": true
}
}
}
Where null or false values are implicit and final headers can be aggregated based on all the keys that are used in the structure.
You mentioned being maximally explicit. Is this why you're steering clear of a system like this?
If we want to keep this indexed format, I'd suggest changing the encapsulating object into an array—though I imagine this may be incompatible with the table schema format:
Looking more into it again: https://www.bluefeathergroup.com/docs/accordion-tables/the-json-table-structure/example-simple-table/
It seems there's a fair amount of freedom in how to JSON-ify the table. What you see here is merely the direct output of pandas convenience functionality, I could coerce the more rigorous form with a bit of extra work
You mentioned being maximally explicit. Is this why you're steering clear of a system like this?
Partly - the true reason is because I'm enforcing PEP 20 principles
https://peps.python.org/pep-0020/
Explicit is the number one goal, and flatness also comes in later - sometimes nesting is better when things need to be validated against or with nearby associated values (mostly thinking pydantic there), but IDK about this case since we're not really doing much complicated validation
Otherwise for further simplification of the parsing, I was thinking we could further simplify the structure and unnecessary data specification:
I figured that would only work if the schema specifies that a given columns values are optional (and to fill with null if missing)
Can you try parsing this current .json
in a splinter branch and see if your parsers under the current form + how they render the output? (not sure how you'd generate/share the demo web page since the GitHub pages probably only renders from main
not dev branches)
I'll make a fork to share the updates with you
I figured that would only work if the schema specifies that a given columns values are optional (and to fill with null if missing)
This is an entirely separate system from the GUIDE, so I technically don't need the schema at all and can simply make reasonable assumptions. So whether we'd like to keep the schema / actually use it for the generation is up to you.
@garrettmflynn I'm referring to the generated table schema: https://github.com/catalystneuro/format-support-table/pull/3/files#diff-f8d9ebc30eeb55fde7a40523b2cb34bf94517bbca0a37f381f57e12fccd258c2R1, not anything in the GUIDE
Only if it's useful for your side, which I wasn't sure if it would be. Also because I figured there would be some area where I could attach column descriptions, but doesn't look like it
Sure. Yeah I just meant none of this code actually relies on it—and it looks like it won't really need to.
We can always bring it back later. This is very predictable for the moment.
Here's that fork serving from my updated add_json_table_ecephys
branch: http://garrettflynn.com/format-support-table/
@garrettmflynn That looks like it's parsing pretty easily then
I updated the file on this branch to use arrays instead of dictionaries as suggested
Also went ahead and removed schema file
I think for the future URL links it might be easier or make more sense to have a separate table of the same size where every cell has an optional URL. Then you treat that like a mask of the main table and you can figure a way to add hyperlinks around the elements (or w/e other way is easier)
I'd just say an entry can either be a value OR an object with a "value" key and any other metadata—which we can handle however we like.
Is that fine?
I'd just say an entry can either be a value OR an object with a "value" key and any other metadata—which we can handle however we like.
I'll play with that in a follow-up; the linkage would make sense in that case to avoid shape/index mismatches, yes so that sounds like a good idea
Sweet I've updated my fork to read from the array-based json.
The only tweak I'd suggest is converting "Suffix(es)" and "Versions" to their singular form since each is associated with a format explicitly now, so isn't really a collection.
Actually, I just realized we are allowing multiple suffixes and versions. There just might be multiple rows per format anyways. Nevermind...
@garrettmflynn Anything else for this PR? If not we can merge and then I'll generate similar .json for the other two tables
@garrettmflynn Here is the first example of that JSON reduction of the table data for Ecephys