Open maxachis opened 1 month ago
@maxachis I think this is coming from/used by code in the automatic-archives repo: https://github.com/Police-Data-Accessibility-Project/automatic-archives
@josh-chamberlain Related to what is discussed in this issue, I think setting up an archived_data_sources
table in the database will help clarify logic and keep data_sources
from being too populated.
Also (and possibly relevant to the above point), I would like to know by what conditions we determine whether a data source should be archived, or if every data source should be archived.
@maxachis
Also (and possibly relevant to the above point), I would like to know by what conditions we determine whether a data source should be archived, or if every data source should be archived.
If I'm following the archive logic correctly in the SQL query, in plain english: A source is archived if: The source was never archived before OR the source is updated regularly AND the source url is not broken AND the source url is not null
So we basically don't want to waste time archiving sources that are never updated and have already been archived or are broken
@EvilDrPurple nailed it! want to work on this issue / the other one linked? this is not related to v2, but it's still pretty important that it works correctly and clearly. One day we will get more sophisticated than internet archive.
@maxachis agreed. it was built by someone without a ton of experience, so it makes sense that there would be room for improvement and clarity! at most, data_sources
would be linked to archived_data_sources
so we know which sources we're talking about, and can answer questions about when a source was last archived and stuff.
I'd consider renaming this issue to something like "separate archives logic from data sources app". Ideally, our API would just exist
, and automatic-archives could do everything it needs to do with the API. automatic-archives
is an example of the kind of project we would love for other people to build ~independently using our API.
@EvilDrPurple nailed it! want to work on this issue / the other one linked? this is not related to v2, but it's still pretty important that it works correctly and clearly. One day we will get more sophisticated than internet archive.
@josh-chamberlain yeah sure! I believe I may need access to the database but I can start on it
The Archives resource and its attendant functionality is unclear to me on several levels:
I'd like to figure out the answer to these questions, and possibly refactor the Archives resource for improved clarity, or update the documentation.
related to https://github.com/Police-Data-Accessibility-Project/automatic-archives/issues/21