Open mfenner opened 4 years ago
@mfenner, I have now prototyped the above (see mybinder). The prototype notebook actually contains three different queries: "Machine learning", "COVID" and "Shakespeare", in order to contrast different trends.
I'm not yet sure how to paginate through results in when issuing multiple GraphQL queries, but at worst can always move back to a single "Machine learning" query once the above pagination fix in GraphQL has been deployed.
@mfenner, I have now documented in Markdown the notebook for user story 5. Please note that the top Markdown table appears borderless in Jupyter lab and mybinder.org, but somehow not in github.
I have also replaced the query: COVID with ebola, to satisfy the requirement of showing a more interesting growth trend in the number of dissertations in recent years.
Finally, I have added pie charts showing the number of dissertations per repository, thus discovering the source of German words in the Shakespeare word cloud - as Universities of Heidelberg and (prominently) Vienna featured as repositories.
@mfenner, to allow Frances to work on feedback on this user story I have switched off the pagination functionality (that causes the 'Invalid AST Node' error) - until we work out how to make gql pagination through results work. In addition, am only fetching 100 first results, because fetching 200 or more, causes 'Cannot return null for non-nullable field Creator.name' exception.
Feedback on Documentation aspects:
Cell 1 -
Inserting some sort of visual representation of what the results of the notebook will be, as there are many outputs for this one, perhaps just one visual
I think the introductory sentences might be rephrased to be a bit clearer:
'This notebook uses the DataCite GraphQL API to retrieve all dissertations for three different queries: Shakespeare, Machine learning and Ebola. These queries illustrate trends in the number of dissertations created over time.'
Beneath 'Define and run GraphQL query'
Cell 115 - is the comment at the beginning of the cell correct? 'Find all outputs FREYA project…'
@FrancesMadden, thank you for the comments. I have just pushed a change to address them - please let me know if anything is still outstanding.
@mfenner, I've just pushed the change to page through results - now retrieving e.g. all ~1700 records for the 'Machine Learning' query. The cursors on GraphQL side work as expected, though for 'Shakespeare' query (114 results) I observed the following (retrieving 100 results per page):
"pageInfo": { "hasNextPage": true, "endCursor": "MTU3MzMyMDEwMjAwMCwxMC4yNTM2NS90aGVzaXMuNjcwMg" },
"pageInfo": { "hasNextPage": true, "endCursor": "MTU5MTgwMzI5ODAwMCwxMC4yNTYwMi9nb2xkLjAwMDI4NzUw" },
"pageInfo": { "hasNextPage": true, "endCursor": null },
whereas after the second page I would have expected:
"pageInfo": { "hasNextPage": false, "endCursor": null },
I've coded around the above, but for the future imho it would be more intuitive to set "hasNextPage" to false when there's no "endCursor".
machine learning
.NB. This requires a fix for pagination in GraphQL, which is underway (https://github.com/datacite/lupo/pull/511) and should be ready next week.