Closed mjarrett closed 5 days ago
as my understanding is that in general a resource corresponds to a single file
I would say instead in general a resource corresponds to a single table.
When a format stores a single table per file, then yes, there's a 1-1 relationship between resources and files. But in the case of spreadsheets (and databases), where multiple tables are stored in a single file, you'll end up having multiple resources (one for each table) defined using that same file. So the example definition you shared looks good to me!
One thing to be aware of -- version 2 of the table dialect standard uses sheetName
instead of sheet
. I believe at some point this will be implemented in frictionless-py, but I'm not sure what the timeline is/will be.
Oh oops, I didn't notice the resources in your example also have the same name! You'll want to make those unique within the data package. A naming scheme something like this would do, for example:
{
"profile": "tabular-data-package",
"resources": [
{
"name": "mydata-sheet1",
"path": "mydata.xlsx",
"dialect": {
"sheet": "sheet1"
},
{
"name": "mydata-sheet2",
"path": "mydata.xlsx",
"dialect": {
"sheet": "sheet2"
}
}
]
}
Thank @khusmann thinking of a resource as a table instead of a file makes perfect sense, I'll go with something like your example.
I'm interested in how people are applying the spec to the case where an Excel workbook (.xlsx) has tabular data in multiple worksheets.
I see this issue (https://github.com/frictionlessdata/datapackage/issues/793) which suggests specifying the sheet name in the dialect attribute. The example provided makes sense but only if there is a single worksheet in the file. I've been considering a case like this:
But in the above case there are two resources with the same name and path. I'm not even sure the worksheets should be considered separate resources, as my understanding is that in general a resource corresponds to a single file. I suppose this is not limited to Excel, as there are other file types where multiple tables can be saved in a single file.
I'm very interested if/how others have dealt with this situation. Examples would be very appreciated!