Works for me, although I'm not a massive fan of the notion of "namespaced stage names" - it's super confusion to follow these. Perhaps this could be done in the two functions that actually need to access them (timeout, and check timeout)?
The main user of the namespaced stage names is get_stages. The problem without properly namespaced stages is that if fetch stage of crawler1 is on timeout, we don't want all fetch stages across all crawlers to be on timeout.
This is where the meaning of a stage differs in Aleph and Memorious. In Aleph a stage named INGEST is doing the same thing for all datasets. But in Memorious a stage named fetch can do different things in different crawlers. With namespaced stage names, we can rate limit just the fetch stage of crawler1 instead of rate limiting fetch stage in all crawlers.
The main user of the namespaced stage names is
get_stages
. The problem without properly namespaced stages is that iffetch
stage of crawler1 is on timeout, we don't want allfetch
stages across all crawlers to be on timeout.This is where the meaning of a stage differs in Aleph and Memorious. In Aleph a stage named
INGEST
is doing the same thing for all datasets. But in Memorious a stage namedfetch
can do different things in different crawlers. With namespaced stage names, we can rate limit just thefetch
stage of crawler1 instead of rate limitingfetch
stage in all crawlers.