Open peterdesmet opened 3 months ago
Thanks @peterdesmet for the nice summary. Here below my two cents
get_schema()
needs package$directory
I like option 2, adding directory
to resource
. Maybe you would like to add it as attribute instead? Benefit: you don't need to remove it in write_resource
. It is in some way "hidden", but still accessible, e.g. attr(x$resources[[1]], "package_directory")
.
Here a reprex:
library(frictionless)
x <- example_package()
# Add directory as attribute of the resources
x$resources <- purrr::map(x$resources, ~ {
attr(.x, "package_directory") <- x$directory
.x
})
# Show the added attribute by printing all attributes of the resources
purrr::walk(x$resources, ~ print(attributes(.x)))
#> $names
#> [1] "name" "path" "profile" "title" "format" "mediatype"
#> [7] "encoding" "schema"
#>
#> $package_directory
#> [1] "C:/R/library/frictionless/extdata"
#>
#> $names
#> [1] "name" "path" "profile" "title" "format" "mediatype"
#> [7] "encoding" "schema"
#>
#> $package_directory
#> [1] "C:/R/library/frictionless/extdata"
#>
#> $names
#> [1] "name" "data" "profile" "title" "schema"
#>
#> $package_directory
#> [1] "C:/R/library/frictionless/extdata"
Created on 2024-07-18 with reprex v2.1.0
resource
for get_schema()
Indeed, I agree with all the statements here. In particular, it sounds to me not completely logical that the a user can read the data of a resource, but not getting it "as it is".
Moving the resource package checks to a check_resource()
is also very nice as it makes the checks more modular and so better software speaking (better to debug/refactor/read...).
Maybe you would like to add it as attribute instead?
For a while I've been thinking about an approach close to this, but where resources are loaded by read_resource
as subclassed tibbles and contain all the resource metadata in a "frictionless" attribute. (This would scope all the frictionless attrs inside the tibble, separating them from other tibble attrs related to its display, e.g. class
, row.names
, etc).
But I guess here we're talking about just resource descriptors, without their data loaded. Still, I think a resource class would be an interesting approach -- this class could be upgraded to the subclassed tibble when the user decided to load data (but then after loading data the object would still work with all the original descriptor fns)
The downside is that (public) functions that use resource should then check if it is valid.
The nice thing about making the resource descriptor its own class, is that you wouldn't need to always be scanning the descriptor to see if it was valid if you only allowed valid resources to be stamped with the custom class. (So the class operates in a sort of newtype
pattern)
This would mean that you would not want to allow resource descriptors to be mutated directly; instead you'd provide an API that would allow the user to add / remove descriptor properties but always verify they resulted in a valid descriptor. (Similar to what I suggest in #198)
I'll see about pulling together an example to demonstrate more of what I mean...
The following helper functions have
package
andresource_name
as arguments:locale()
read_from_path()
write_resource()
But they are called from functions that already have the
resource
. That means that currently, they recallget_resource()
, in the case forlocale()
for a third time:read_resource()
->read_from_path()
->locale()
This could be simplified by passing
resource
as an argumentlocale()
: needsresource$encoding
andfields
. Callsget_schema()
read_from_path()
: needsresource$dialect
,schema
andfield_names
. Callsget_schema()
write_resource()
: needsresource
get_schema()
needspackage$directory
The problem with
get_schema()
is that is needspackage$directory
, so merely passing the resource is not enough. Solutions:directory
argument toget_schema()
: no, this is clunkydirectory
toresource
as part ofget_resource()
. That way,get_schema()
can get it from there. It can be removed as part ofwrite_resource()
Using
resource
forget_schema()
If the helper functions use
resource
as an argument, one could argue to also use it for the public functionget_schema(resource)
. If we do that,get_resource()
should become a public function too, which can be useful (especially for developers). The downside is that (public) functions that useresource
should then check if it is valid. Most of theget_resource()
checks should then move to acheck_resource()
function.@damianooldoni @khusmann others... Thoughts?