Open ChasNelson1990 opened 3 months ago
May be someone already did this with other Fjelltopp CKAN project?
@A-Souhei - perhaps have a read of this: https://ngr.coar-repositories.org/behaviour/resource-transfer/
I think there could be an argument that the datastore API does this for CSV-style data? What about textual data though - is there an existing solution?
@ChasNelson1990 based on your comment, if I understand well, my idea would be:
Also, could integrating a small LLM help translate the text for better / more results? (just had it in mind, may be not so good)
Otherwise, we can use the CKAN API query resources and resources metadata : https://docs.ckan.org/en/latest/api/legacy-api.html
Has anybody made an existing ckanext that processes text files like that?
The LLM is a good idea too - but too big for ZaRR
@ChasNelson1990 https://github.com/stadt-karlsruhe/ckanext-extractor looks promising.
@A-Souhei - that extension is quite old (even the language they are using is really old CKAN stuff). There is fork here where somebody has updated it for CKAN 2.9, but it may not work for 2.10.
If you wanted to, I would spend 1 -- 2 hours installing it, adding it to the local dev config and just uploading one file to see if it works. If it looks like it works then great, we can invest some time in making sure it's up-to-date... but if you can't get it working in an hour then we should discuss further whether this is right solution.
Alright, I'll create a ticket for it.
Just use this ticket @A-Souhei
@ChasNelson1990 while I was able to install the plugin without issue, extracting the extractors using the command ckan -c /etc/ckan/production.ini extractor extract all
generates an issue in file https://github.com/dathere/ckanext-extractor/blob/master/ckanext/extractor/cli.py . The generated error stacktrace is :
2024-08-14 09:33:43,978 INFO [ckanext.extractor.cli] Extraction started ...
0414096d-6160-4c69-b5f7-2a9d8ad1d40c: Traceback (most recent call last):
File "/usr/lib/ckan/.minikubevenv/ckan-ALitmJXH/lib/python3.8/site-packages/ckan/lib/navl/dictization_functions.py", line 246, in convert
nargs = converter.__code__.co_argcount
AttributeError: type object 'str' has no attribute '__code__'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/lib/ckan/venv/bin/ckan", line 8, in <module>
sys.exit(ckan())
File "/usr/lib/ckan/.minikubevenv/ckan-ALitmJXH/lib/python3.8/site-packages/click/core.py", line 1130, in __call__
return self.main(*args, **kwargs)
File "/usr/lib/ckan/.minikubevenv/ckan-ALitmJXH/lib/python3.8/site-packages/click/core.py", line 1055, in main
rv = self.invoke(ctx)
File "/usr/lib/ckan/.minikubevenv/ckan-ALitmJXH/lib/python3.8/site-packages/click/core.py", line 1657, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/usr/lib/ckan/.minikubevenv/ckan-ALitmJXH/lib/python3.8/site-packages/click/core.py", line 1657, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/usr/lib/ckan/.minikubevenv/ckan-ALitmJXH/lib/python3.8/site-packages/click/core.py", line 1404, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/usr/lib/ckan/.minikubevenv/ckan-ALitmJXH/lib/python3.8/site-packages/click/core.py", line 760, in invoke
return __callback(*args, **kwargs)
File "/usr/lib/ckan/.minikubevenv/ckan-ALitmJXH/lib/python3.8/site-packages/click/decorators.py", line 26, in new_func
return f(get_current_context(), *args, **kwargs)
File "/usr/lib/ckan/.minikubevenv/ckan-ALitmJXH/src/ckanext-extractor/ckanext/extractor/cli.py", line 78, in extract
result = extract(context, {'id': id, 'force': force})
File "/usr/lib/ckan/.minikubevenv/ckan-ALitmJXH/lib/python3.8/site-packages/ckan/logic/__init__.py", line 580, in wrapped
result = _action(context, data_dict, **kw)
File "/usr/lib/ckan/.minikubevenv/ckan-ALitmJXH/src/ckanext-extractor/ckanext/extractor/logic/helpers.py", line 42, in wrapped
return f(context, data_dict)
File "/usr/lib/ckan/.minikubevenv/ckan-ALitmJXH/lib/python3.8/site-packages/ckan/logic/__init__.py", line 678, in wrapper
data_dict, errors = _validate(data_dict, schema, context)
File "/usr/lib/ckan/.minikubevenv/ckan-ALitmJXH/lib/python3.8/site-packages/ckan/lib/navl/dictization_functions.py", line 305, in validate
flat_data, errors = _validate(flattened, schema, validators_context)
File "/usr/lib/ckan/.minikubevenv/ckan-ALitmJXH/lib/python3.8/site-packages/ckan/lib/navl/dictization_functions.py", line 356, in _validate
convert(converter, key, converted_data, errors, context)
File "/usr/lib/ckan/.minikubevenv/ckan-ALitmJXH/lib/python3.8/site-packages/ckan/lib/navl/dictization_functions.py", line 248, in convert
raise TypeError(
TypeError: str cannot be used as validator because it is not a user-defined function
I'll need more time to investigate.
The metadata and the resources in the repository can be copied or migrated to other systems