edgi-govdata-archiving / web-monitoring-processing

Tools for access, "diff"-ing, and analyzing archived web pages
https://edgi-govdata-archiving.github.io/web-monitoring-processing
GNU General Public License v3.0
20 stars 20 forks source link

Testing against sensitive data #87

Open danielballan opened 7 years ago

danielballan commented 7 years ago

We will want to run tests against real lists of changes that were flagged for review. Some of the elements of these lists are already public because they are the subject of old EDGI reports. Some are not public -- they were interesting enough to trigger careful review, but ultimately did not generate a report. Finally, there will be an incoming stream of new entries to the list that are not yet sorted into one of those two categories.

After a conversation with @trinberg, I propose the following system:

attn @janakrajchadha

janakrajchadha commented 7 years ago

I had a couple of questions here.

danielballan commented 6 years ago

Apologies for my overdue reply, @janakrajchadha.

Do we have an existing list of changes which generated old reports?

I'm not sure that we have uuids for them, but we have Versionista links which we can convert into uuids.

"skiptest" features would also skip the tests in CI and accessing the complete list would not be useful. Am I missing something here?

I'm imagining a conditional skip (such as with pytest.mark.skipif) that checks for an env variable containing authentication info.

stale[bot] commented 5 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in seven days if no further activity occurs. If it should not be closed, please comment! Thank you for your contributions.

Mr0grog commented 5 years ago

This is still relevant — outside contributors really need decent data they can grapple with. However, is the long trudge towards a more public staging server the better solution here? (See also edgi-govdata-archiving/web-monitoring-db#34 and edgi-govdata-archiving/web-monitoring-ui#220)

stale[bot] commented 5 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in seven days if no further activity occurs. If it should not be closed, please comment! Thank you for your contributions.