Open Mr0grog opened 6 years ago
At this weeks analyst meeting, CAPTHAs came up as another constantly changing thing that is hopefully easy to identify.
Also:
More far out:
We should probably turn this issue into an umbrella/epic issue for all these different ideas and pieces of work.
From some BLM examples @jschell42 sent me:
id
, class
, name
attributes (and moving those attributes).title
attributes probably should be accounted for somehow, but a) are hard to see and b) probably aren’t a big deal (so they should only matter a tiny bit, if they matter at all).title
or maybe any attribute? (Might need a special list of attributes that have meaning just by their presence, like checked
.)<meta>
modified date? e.g:
<meta name="dcterms.modified" content="2018-06-11T11:58:04-04:00" />
There’s definitely an interesting thing here I wasn’t thinking about before… we could make a big split in prioritization based simply on textual (+ images and such) content changes. I can see some super-useful annotation data we could display for analysts (especially in their sheets) like:
Some diffs for examples:
This issue has been automatically marked as stale because it has not had recent activity. It will be closed in seven days if no further activity occurs. If it should not be closed, please comment! Thank you for your contributions.
Another example of something that should really be totally ignored: https://monitoring.envirodatagov.org/page/c4328d30-cada-452f-8642-4bff721f5fc2/9a448c37-9285-4107-9ffd-ea72214561a4..a8fab661-07bb-4409-92f7-f73deadf4e29 (change to class attribute)
As a first test of all the things needed to automatically rate a change’s significance, priority, let’s start with something simple that looks for changes that we can pretty confidently say aren’t meaningful:
'
→’
)title
,alt
,href
, orsrc
(any others?) are not importantExample: https://monitoring.envirodatagov.org/page/b2b0b8cb-5e9b-4178-91c0-b8cb4466d2bd/b76dd1ab-a7aa-41d6-89f3-c45117a80dc5..2b55beed-db97-4249-b30a-600f61d94eb5
This is an easy analysis to do (and covers a lot of the kinds of changes I think we see), so it’s a good way to make sure we’ve built out: