Closed jsheunis closed 12 months ago
Ideas by @mslw:
A simple solution to cut down translation time massively would be to take the translator and call its
translate()
method directly, bypassing the discovery process. Another idea worth exploring is to forget the file traversal, extraction, and translation altogether, and try to get catalog-schema metadata from git annex output for the dataset.
Kind of a duplicate issue:
I agree with #264 - I think it would be great to keep current behaviour (match per line) and have an option to fix the translator upfront (or from first line).
Another question aside from pickjing translators upfront - although we decided to make the translator's match()
a class method, the translate code still instantiates a translator object:
I understand that this was done because of concerns that match()
may still need to rely on instance methods https://github.com/datalad/datalad-catalog/pull/246#issuecomment-1429677754 -- but I would still argue that in this case we lose all the benefit, and should either abandon the class method approach or expect all methods used for matching to be class methods (preferred).
Although, TBF, I should test whether there is any significant time to be gained :smile:
Although, TBF, I should test whether there is any significant time to be gained smile
Yeah, no, I get almost identical times when adjusting translators and only switching lines 103-105 to create instance only after match :roll_eyes: Gains are to be made elsewhere, probably...
Comments by @mslw in https://github.com/psychoinformatics-de/rfd/pull/68:
We should improve the translator discovery process in
datalad-catalog
to e.g. only match a translator once and then somehow keep track of that and not have to reinstantiate it when a new record requiring the same translator comes along.