archivesunleashed / auk-notebooks

Jupyter notebooks to assist in creating additional analysis and visualizations of Archives Unleashed Cloud derivatives.
https://cloud.archivesunleashed.org
Other
11 stars 5 forks source link

Not all collections will produce all derivatives #10

Closed ruebot closed 5 years ago

ruebot commented 5 years ago

The notebook(s) need to be updated with conditions if particular derivatives don't exist. Right now, it expects all possible derivatives to be supplied.

ruebot commented 5 years ago

At minimum, we produce : a fulltext file, domains file, and graphml file.

We may or may not produce a gexf file depending on how large the graph is, and a user may not have a filtered_text.zip since we introduced that relatively recently.

So, the question is where is the best place wrap a couple things in a try and catch, and catch FileNotFoundError expectations?

@greebie let me know what you think, and if you want to take this on. If you just want to let me know what you think are the best places, I can take care of implementing it.

greebie commented 5 years ago

I think the best way is to create a check function that takes a function and runs the try catch.

For instance:

def collectionExists(fn):
       check = True
       try:
           fn()
       catch FileNotFoundError:
         check = False
       return check

Then you can add variable = if (collectionExists(get_text)) get_text([params]) else ["No file available"]

To make this work generally, we will need to include a network function that looks something like this:

def getGexf(file = auk_gephi):
      return nx.read_gexf(auk_gephi) #import the graph
greebie commented 5 years ago

I am willing to take this on also. :)

ianmilligan1 commented 5 years ago

Great, thanks @greebie - I assigned you do this. Seems like you've got a good plan above.

greebie commented 5 years ago

I'm going to have to wait to wait for #30 otherwise I'll have merging issues.

greebie commented 5 years ago

Okay - close to a PR. The approach I took was very similar, but I decided to just show empty graphs on a fail, but included a file checker script to check the existence of all derivative files.

At this stage the notebook is not using either graphml or the filtered text derivatives, but I included functions to check those for future use.

ianmilligan1 commented 5 years ago

Closed with aed4e11e2f0ad7099f2a5cf7ebc632af28066e54.