Testing against sensitive data

danielballan commented 7 years ago

We will want to run tests against real lists of changes that were flagged for review. Some of the elements of these lists are already public because they are the subject of old EDGI reports. Some are not public -- they were interesting enough to trigger careful review, but ultimately did not generate a report. Finally, there will be an incoming stream of new entries to the list that are not yet sorted into one of those two categories.

After a conversation with @trinberg, I propose the following system:

Make a public a list of changes that generated old reports so that any interested developer can use them for testing.
Maintain a private more complete list. Only distribute this to established contributors.
Use a secure token to make the complete available to CI for testing. Apply "skiptest" features on these tests so that developers can run all the tests locally even if they don't have access to the complete list.

attn @janakrajchadha

janakrajchadha commented 7 years ago

I had a couple of questions here.

Do we have an existing list of changes which generated old reports?
"skiptest" features would also skip the tests in CI and accessing the complete list would not be useful. Am I missing something here?

danielballan commented 6 years ago

Apologies for my overdue reply, @janakrajchadha.

Do we have an existing list of changes which generated old reports?

I'm not sure that we have uuids for them, but we have Versionista links which we can convert into uuids.

"skiptest" features would also skip the tests in CI and accessing the complete list would not be useful. Am I missing something here?

I'm imagining a conditional skip (such as with pytest.mark.skipif) that checks for an env variable containing authentication info.

stale[bot] commented 5 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in seven days if no further activity occurs. If it should not be closed, please comment! Thank you for your contributions.

Mr0grog commented 5 years ago

This is still relevant — outside contributors really need decent data they can grapple with. However, is the long trudge towards a more public staging server the better solution here? (See also edgi-govdata-archiving/web-monitoring-db#34 and edgi-govdata-archiving/web-monitoring-ui#220)

stale[bot] commented 5 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in seven days if no further activity occurs. If it should not be closed, please comment! Thank you for your contributions.

edgi-govdata-archiving / web-monitoring-processing

Testing against sensitive data #87