edgi-govdata-archiving / datavis-projects

Ongoing work on visualizations of coverage & other relevant data.
GNU General Public License v3.0
1 stars 0 forks source link

Changing Data Categories #1

Closed Blackglade closed 7 years ago

Blackglade commented 7 years ago

Currently as displayed in the alpha site: (https://alpha.archivers.space/coverage) the coverage page is as follows:

Protocol (HTTP(s)/FTP) => Website => sub-category => files

I think we should change it to something that feels more relevant to the end user, such as:

Agency/Group => sub-category => files

I am personally of the opinion that anyone who would be viewing this won't really care about the protocol when first viewing the interactive model. We can always include that info later. I think a better solution would be to divide it into 3 separate categories based on just the data I've seen (EPA, Research Institutes, and Other).

Should probably tag people: @b5 @titaniumbones @dcwalk

b5 commented 7 years ago

Great! Totally agreed on this change, users don't care about http, so let's de-emphasize that.

I've just created a new issue that will provide the technical infrastructure for what we're trying to achieve here. While we work on that, there are few things to think about:

getting acquainted with primers - @Blackglade we haven't had a proper chance to talk about primers yet, when you have a change ping me on slack & I can get you more info on them, I think they may help inform the strategy you're envisioning here.

Files vs HTML - these two things often get mixed together in coverage reporting. While we're working on archiving the main thing we're after are files but it often gets heavily mixed in with web pages (html documents). For example, many pages link to datasets, and the pages themselves contain lots of relevant metadata, so we want to (and do) archive that as well.. The main question for visualization is should we represent the difference between pages & files, and if so, how?

In case I forget to correct my jargon, I tend to refer to "files" as content, which is anything that isn't HTML, as opposed to pages, which are html.

b5 commented 7 years ago

Quick recap from a conversation @Blackglade & I had last night, we're planning to focus on higher-level insights for visualization as a starting point. This means visualizing coverage analysis for archivers 2.0 primers & sources, as opposed to trying to visualize individual urls, which get a little unruly.

@Blackglade is going to read through some of the current public primers to get a feel for different areas and refocus visualization strategies around this, we'll do another feedback session to get into the nitty gritty in the near future.

We both agreed that it would be pertinent to consider weather or not primers are the right axis to convey coverage analysis. I personally think they make sense, but opinions welcome!