Open jetlime opened 10 months ago
Thanks @jetlime for this suggestion!
Indeed, it happens sometimes that terms are only available as a downloadable file behind a link. The idea of obtaining the URL dynamically from the DOM is a smart answer to that problem 👍
The main question we need to answer to decide if it would be worth adding a new type of fetch is: are the location and DOM from which we obtain the link any more stable than the link itself? In the case at hand, DSA Transparency Reports are published every 6 months. We'd need to demonstrate that the location and DOM from which the link can be obtained change significantly less often than twice a year, otherwise the maintenance burden will be the same on collection maintainers, and we would have increased software complexity for nothing 😰
The next investigation steps I see are:
l
) vs how often the target of the link changed (t
) in at least the last 2 years.If t > e ⨉ l
, where e
is some arbitrary multiplier encoding the effort it would take to implement this feature, we'll consider it 🙂
In some sites such as the linkedin transparency reports, the terms of interest are located in dynamically named endpoints that could for example be determined by time (e.g. October-2023-LinkedIn-DSA-Transparency-Report10.pdf). These dynamic endpoints of interest are in most cases located in fixed locations. It thus makes sense to introduce the new declaring term
dynamic-fetch
.This term will fetch the document located on the dynamic endpoint
dynamic-fetch.variable
defined atdynamic-fetch.location
. It will be complimentary tofetch
.It could potentially be defined as follows,
As I am pretty new to this tool, I would be happy to hear some feedback about this proposition! If you share my vision, I would be happy to implement it :)