edgi-govdata-archiving / archivers.space

🗄 Event data management app used at DataRescues
https://www.archivers.space/
GNU Affero General Public License v3.0
6 stars 3 forks source link

"Crawlable" phase --> What does that mean exactly? #25

Closed kmcculloch closed 7 years ago

kmcculloch commented 7 years ago

From @khdelphine on February 10, 2017 15:21

http://www.archivers.space/urls?phase=crawlable

Copied from original issue: b5/pipeline#58

kmcculloch commented 7 years ago

From @khdelphine on February 10, 2017 15:22

We probably needs to be explained to the user in some way.

kmcculloch commented 7 years ago

From @dcwalk on February 12, 2017 2:26

@khdelphine my understanding is these URLs are those marked:

"Do not harvest. All data is small, unstructured, and on a page crawlable by the Internet Archive."

kmcculloch commented 7 years ago

From @dcwalk on February 12, 2017 2:26

Are you still thinking the language could be clarified?

kmcculloch commented 7 years ago

From @khdelphine on February 14, 2017 12:53

@dcwalk Thank you for clarifying. That's what I thought but I was not 100% sure.

If we could add one line of explanation on that page http://www.archivers.space/urls?phase=crawlable before the list of URLs that would be great. For instance saying "These URLs do not need to be harvested. They were submitted to the Internet Archive to be crawled."

In general, adding such a quick explanation for each phase (Research, Harvest, Bag...) would go a long way to making the application more self-explanatory, I think.

Does that make sense?

kmcculloch commented 7 years ago

This could also be addressed in documentation. I did a search on the term "crawlable" but didn't find an explicit explanation of that stage in the pipeline: https://datarefuge.github.io/workflow/search.html?q=crawlable

The documentation might benefit from a general UI overview page: one that just annotates all of the elements on the URLs page, rather than burying all of our "what is this?" explanations inside the workflow descriptions. Of course, that also means a few more documentation pages to maintain...

khdelphine commented 7 years ago

I really think the best would be to add a 1 line explanation of each URL Phase right above the list of URLs starts (as I suggested above). It would make the interface more self explanatory.

dcwalk commented 7 years ago

@kmcculloch -- we integrated a video walkthrough of the interface to the documentation on the Researchers and Harvesters page.

I've created an issue: https://github.com/datarefuge/workflow/issues/92

dcwalk commented 7 years ago

Could we maybe have a one sentence description at the top of each phase?

khdelphine commented 7 years ago

I was thinking of something like the picture attached. Each time you would click a different phase it would show a different 1 sentence explanation. Does that make sense? Mockup

khdelphine commented 7 years ago

We could even add a small "i" info icon at the end of each sentence that would link to a specific page of the documentation for each one.

kmcculloch commented 7 years ago

This seems totally doable. It's on the board: https://github.com/edgi-govdata-archiving/archivers.space/projects/2

khdelphine commented 7 years ago

Yay! Thanks.

Note that at the moment we have documentation pages for each phase except Crawlable. So I guess we need to quickly put one together. It should be a pretty short one.

khdelphine commented 7 years ago

I just created a Crawlable page. It is in this PR: https://github.com/datarefuge/workflow/pull/95

khdelphine commented 7 years ago

Here is some proposed text for each phase. Feel free to edit. (Note the "i" stands for an "Info icon" like this http://icons.iconarchive.com/icons/hopstarter/soft-scraps/256/Button-Info-icon.png)

kmcculloch commented 7 years ago

This has been merged to master and deployed