ScienceCommons / curate_science

Transparency & credibility curation products for all research stakeholders.
https://CurateScience.org
MIT License
12 stars 6 forks source link

Article editor modal window #10

Closed eplebel closed 5 years ago

eplebel commented 5 years ago

specs for new/edit article info modal window for within-card editing of article information or for creating a new article (all fields blank in this case): new_edit-article-info

no deep-linking, no auto-complete fields. the only "smart field" is the DOI field whereby it should give an error (on save) if DOI is already in the DB

this should be implemented exactly as is (given that this will currently mostly be for internal use only; we can improve the UI/UX later). please note that the font and padding should be very small/compact so that all fields can be visible on a large desktop screen (crucial to maximize data entry/curation efficiency).

onejgordon commented 5 years ago

Got it. So, to confirm, for this shallow version (no auto-completes) we will be recording data in a way that's different, for many fields, than the current data model. E.g. Target effects will not be stored within a separate Effect() model, but as text instead?

Same confirmation for authors, journal, etc. I'm making this assumption because without the autocomplete, if we create new records when we save curated articles, we'll end up with duplicates via misspellings, different formatting, etc.

Other data model changes I'm noticing:

Other changes to confirm:

@eplebel am I interpreting correctly? Since a number of significant data model changes are happening in this version, is your recommendation that @alexkyllo take the lead on schema changes, and I focus on the UI?

eplebel commented 5 years ago

we will be recording data in a way that's different, for many fields, than the current data model. E.g. Target effects will not be stored within a separate Effect() model, but as text instead?

correct.

if we create new records when we save curated articles, we'll end up with duplicates via misspellings, different formatting, etc.

also correct. we can cleanup/merge these later when we implement the auto-completes (we'll be scaling up slowly so this will be feasible).

Are replicates a new data model, or similar to Study() in the prior version? I see that many can be created per article

correct, but i've decided to drop this for now (given these are pretty rare; will add later).

What is the # of reps (farthest left input in the original article case), and how is this different than the # of replicate rows that are added in the UI?

"number of reps" is the number of replication studies reported in a replication article (which is needed now because before we'd just count the # of studies curated within a replication article, but now we're not curating at the study-level). these are different from # of replicates, which are the number of known replication studies (across articles) of effects reported in an original article (e.g. http://curatescience.org/author/daniel.lakens.html#10.1111/j.1467-9280.2009.02426.x)

What is the replication article URL meant to store.

this is meant to store a URL to the full-text of the replication article (in whatever form, PDF, HTML, or preprint). this is now irrelevant given that we're dropping the replicates for now, but i've now added an "Original article URL" within the "Replication details" section.

One link per transparency, rather than allowing multiple (this wont require a data model change, we can enforce in UI)

correct.

is your recommendation that @alexkyllo take the lead on schema changes, and I focus on the UI?

yes this sounds like a good plan as long as @alexkyllo is game.

alexkyllo commented 5 years ago

I'm game--though my initial reaction is that the entity model needed to support this UX is so different, and so much simpler, that implementing it in the existing Django app will mean deleting a lot of code--to the point where it might be easier and faster to just start a new app from scratch.

@eplebel what's the likelihood that you decide you want to go back to the deep-linking approach in the future? Did you decide that it was the wrong strategy altogether? Or are you thinking that you want to start off with a "flatter" design for the initial release, and add deep-linking features incrementally over time, eventually ending up with a complex entity model again?

A few other questions I have:

eplebel commented 5 years ago

what's the likelihood that you decide you want to go back to the deep-linking approach in the future? Did you decide that it was the wrong strategy altogether? Or are you thinking that you want to start off with a "flatter" design for the initial release, and add deep-linking features incrementally over time, eventually ending up with a complex entity model again?

the likelihood is greater than 50%. and yes the thinking is to start off with a "flatter" design initially, and add deep-linking features incrementally over time (which i'd say wasn't a wrong strategy, but just too expensive development and user time-wise relative to the value created).

Shouldn't Article.authors be a multi-autocomplete field, in order for the "list articles for author" UX described in #12 to work?

i don't think we need this for now. we can list all unique article IDs linked to a specific author, even though the authors field is just text (and even though a unique article can be linked to other authors or not linked to any author at all). this means author names within article cards cannot be linked to author pages, but that's OK given this will only be the case for < 2% of authors.

What should happen when the user clicks the "+ Add" button under Figures/Tables? ... I don't think there's space there for more than one or two... which direction would the list grow on the page if I add a lot of them?

this was done inline in the new app (https://curatescience-staging.appspot.com/new/article/8/edit), but for now, we will go with the simpler process of opening a "browse for file" window and letting users select the figures/table images (multiple selection allowed; and a URL can also be inputted as is standard for this type of dialog window). the image(s) will then be uploaded and thumbnailed. the image list can grow vertically within a 3-column div (given there will be < 6 figures/tables 90% of the time).

Could you please explain the counts of citations, downloads, and views for the URL links?

citations, downloads, and views are values manually curated from external sites (e.g., Google Scholar, publisher websites), so they are just text

What does "Protected access" mean?

this just means the data/materials are posted online, but only registered researchers can access them (so it's more transparent than not publicly posting data at all). for now, "protected access" will appear as text in the popup (but later this will be integrated into the open data and materials badges).

How important is the radio button for "Preregistered study design + analysis", "Preregistered study design", "Registered report"?

this is crucial because it changes the preregistration badge type and "Registered Report" label (on the article type line, e.g., http://curatescience.org/author/daniel.lakens.html#10.3389/fpsyg.2014.00875).

Are you sure you don't want the Commentaries and Replicates to be links to other Article pages on your site?

yes.

And are you sure you want to add nested child entities (Commentaries and Replicates) within this modal window? If I click "+ Add additional commentary" many times, will the modal window keep growing downward indefinitely? Will it add a vertical scroll bar after n rows?

yes the modal window growing downward is fine because there will rarely be more than 1-2 commentaries.

alexkyllo commented 5 years ago

I've gone through and listed out in detail the things I will need to change in the app's backend Python code in order to support this, that way I can use it as a checklist to make sure I get them all done. My guess is that this will be 10-15 hours of work for me. Let me know if you think anything is missing or looks inaccurate:

Making schema changes this drastic without losing any data would be significantly more challenging and time-consuming. Is it OK to delete all the data from the tables in order to apply these schema changes? I can write a new migration script to populate the same set of initial Article and Author entities back into the database if you want, or we can start with an empty database--let me know.

Related to that, do you still want the backend to support Article data import/export via Google Sheets and/or Excel spreadsheets? That seems like a sizeable feature, I'm guessing another 10 hours. Although if we have that feature, it's probably unnecessary to write a seed migration, so that would save an hour or two.

eplebel commented 5 years ago

looks good. only 1 inaccuracy and a few follow-up questions:

Is it OK to delete all the data from the tables in order to apply these schema changes? I can write a new migration script to populate the same set of initial Article and Author entities back into the database if you want, or we can start with an empty database--let me know.

yes it's OK to delete all current data. but yes it'd be good to populate the database with a few of the current articles, but with "flattened" data structure (e.g., https://curate-science-216207.appspot.com/articles/8/ , https://curate-science-216207.appspot.com/articles/2/).

Related to that, do you still want the backend to support Article data import/export via Google Sheets and/or Excel spreadsheets?

no that won't be needed.

alexkyllo commented 5 years ago

profile_urls as a JSON field works for me as long as it works for @onejgordon. JSON fields are schemaless so he can decide what key names to get and set.

We may need to add some of the models back later, but dead code is baggage and it's better to just delete it--we can always go back and find it in the git repo history and re-add it again later.

We actually have 3 model classes related to users, and they're associated through a foreign key relationship: Author, UserProfile, and User (which is built-in to Django admin system): link. I'm considering whether to combine Author and UserProfile into a single model. Do you expect that authors and admins will be the only two types of authenticated users? Or do you think we will have some other type of user account profile that is neither an author, nor an admin?

E-mail address is the username, but I do need to implement a feature for the ability for admins to send invites. Looks like there are existing packages for this, so it shouldn't be much work. And since only admins will have invite permissions, there's no need for an invite quota. But if you decide you want to give authors the ability to send invites with a quota, let me know.

eplebel commented 5 years ago

JSON fields are schemaless so @onejgordon can decide what key names to get and set.

sounds good.

we can always go back and find it in the git repo history and re-add it again later.

sounds good.

I'm considering whether to combine Author and UserProfile into a single model. Do you expect that authors and admins will be the only two types of authenticated users? Or do you think we will have some other type of user account profile that is neither an author, nor an admin?

yes we'll need non-admin non-author user accounts for volunteer and paid curators (who won't necessarily get admin privileges), so we probably need to keep separate Author and UserProfile models.

Looks like there are existing packages for this, so it shouldn't be much work. And since only admins will have invite permissions, there's no need for an invite quota. But if you decide you want to give authors the ability to send invites with a quota, let me know.

ok sounds good. and indeed authors should have an invite quota, which initially will be set to 0. only once the author page is tested and improved based on a small group of users (~15-20), will we want to give 2-3 invites to each user (slow scaling will allow more attention to user feedback and bug fixes/feature improvements, even though it is tempting to want to try to scale up more quickly).

onejgordon commented 5 years ago

JSON fields are schemaless so @onejgordon can decide what key names to get and set.

sounds good.

This works for me.

alexkyllo commented 5 years ago

@onejgordon if we're going to implement uploading and storage for KeyFigure image files, there are a couple of options for how the REST API can accept them. If the media type is application/json then the image can be a base64 encoded string within the JSON, or if the media type is multipart/form-data then the image can be sent directly, but the other KeyFigure fields would need to be sent in a separate request.

Do you have a preference? I'm not exactly sure how image uploading works on the JavaScript side of things, or which way is easier.

When you want to retrieve the stored image, there will be a URL link in the KeyFigure JSON to where the file is available on the web server.

onejgordon commented 5 years ago

I may have missed it, but have we determined whether this modal allows both creation and editing? It would simplify things a bit to know that when in the modal, the base Article model has already been created, and therefore the only capabilities of this UI are to:

If this is the case, a single image multipart/form-data API is easy to work with -- we'll upload on the fly as images are selected.

If it's necessary to allow creation through this form, then there's nowhere to link the figures to until the article is created, which complicates the flow. In that case, we might prefer a single multipart/form-data POST with multiple files included in the request. This is something I've done before (though not in django), but @alexkyllo it sounds like you're saying handling multiple files at once is difficult on the server side?

eplebel commented 5 years ago

I may have missed it, but have we determined whether this modal allows both creation and editing?

yes that was the plan (hence the "new/edit article info modal window" wording). it sounds possible that when the user clicks "ADD ARTICLE", a new base article model is created as the empty modal window is opened (and if the user clicks cancel, then the article can just be destroyed right?).

i don't have much to add re: how to store the images. whatever is simpler for now (as long as it can scale) and as long as it yields reasonable performance for the current limited budget.

alexkyllo commented 5 years ago

@onejgordon I think multipart/form-data can handle multiple files, but I don't think it can mix files and JSON

onejgordon commented 5 years ago

@onejgordon I think multipart/form-data can handle multiple files, but I don't think it can mix files and JSON

You're right, in my previous application I was using multipart form inputs + form parameters, which could include encoded JSON objects, but not a traditional JSON body as would be consistent with the rest of the CS API.

a new base article model is created as the empty modal window is opened (and if the user clicks cancel, then the article can just be destroyed right?).

We could do it this way, but we can't guarantee the cancel button will be clicked (users can exit the UI by closing a tab, computer crash, etc).

This general need is a fairly common webapp flow, so we can take cues from other solutions. Some considerations to help guide our solution:

alexkyllo commented 5 years ago

I agree with @onejgordon , the saving to the database should be done in one shot, we don't want to leave a transaction hanging open across requests and have to roll it back if it's abandoned. That would cause all sorts of problems.

One possible option is I can give you a REST endpoint to submit the KeyFigure metadata to, then in the response body, return you a second URL to PUT the binary image data to directly. This would be like what YouTube does when you upload a video.

Or, for article creation, I could just give you one big multipart/form-data endpoint instead of a REST endpoint. I can give you the version of the HTML form that Django renders, and you can just reimplement it in React. Would this be easiest way? I think it's the easiest from my perspective.

eplebel commented 5 years ago

have a live/published property on the article that is automatically flipped once all required fields are present

this sounds good if it simplifies things

alexkyllo commented 5 years ago

How important is the image upload feature anyway? Aren't the article figures generally already hosted somewhere else? Can we just continue to hotlink them like we've been doing, and punt on image uploading to a later version of the app? It seems like a feature that's kind of expensive to implement, relative to the value it would add to the user experience.

eplebel commented 5 years ago

unfortunately yes the image upload is foundational for the MVP. this is because replication results are most easily comprehended/ingested by examining a key figure and/or table (e.g., http://curatescience.org/#images-4 ; http://curatescience.org/#images-1 ). this was one of the key deficiencies of our first beta: our UI curate replication results could not accommodate the wide variety of ways that replication results are summarized for different kinds of study designs/research questions (and this is probably near-impossible). authors know best how to summarize their replication results (or original results), and this is virtually always done via a figure or table (or sets of figures/tables).

a small minority of figures/tables are already available online, but only about < 5% and in virtually all of these cases, these don't include figure or table captions, which are crucial for understanding the figure/table.

onejgordon commented 5 years ago

One possible option is I can give you a REST endpoint to submit the KeyFigure metadata to, then in the response body, return you a second URL to PUT the binary image data to directly. This would be like what YouTube does when you upload a video.

Based on the conversation here, I suggest we assume that when the user is at the stage of uploading images, the article has already been created. Then we can limit ourselves to three API calls:

1) Article create/update, 2) Key figure upload (with multipart image data, and specifying article ID) 3) Key figure deletion

Does this sound like a reasonable solution?

Or, for article creation, I could just give you one big multipart/form-data endpoint instead of a REST endpoint. I can give you the version of the HTML form that Django renders, and you can just reimplement it in React. Would this be easiest way? I think it's the easiest from my perspective.

Though things are simpler now with less of the nested object relations, translating an HTML form into a JS API call is not the most straightforward way. I think I'd prefer to just get a list of param names and types, and I'll construct the JSON data to post to the endpoint.

alexkyllo commented 5 years ago

@onejgordon Sounds reasonable, does that mean the KeyFigure metadata would be created in step 1 at the time the article is created/updated? Then once the article is saved, each KeyFigure "row" would have a link/button that you click to get a file upload window?

onejgordon commented 5 years ago

I was thinking that at creation time no key figures would be linked to the article. The second API call would upload an image file (or just a URL), with a parameter article_id to link it to the parent object.

Which metadata are you referring to?

alexkyllo commented 5 years ago

I mean the other fields on the KeyFigure model, like figure_number, is_figure, is_table: https://github.com/ScienceCommons/curate_science/blob/redo/curate/models.py#L159

onejgordon commented 5 years ago

If we make this a multipart/form-data POST creation endpoint can we not pass these as additional params?

eplebel commented 5 years ago

I mean the other fields on the KeyFigure model, like figure_number, is_figure, is_table:

these (figure_number, is_figure, is_table) have all been dropped in the new simplified specs (hence why none of these were mentioned above in the text or mock; but i guess i c/should have explicitly stated this, sorry for any confusion).

so users will just upload the figure/table image (s) (which will include the figure caption or table caption) and that's it (well beyond compressing/thumbnailing it).

alexkyllo commented 5 years ago

Oh ok, so now it's just a bag of images associated to an Article, with no metadata. That simplifies things! Then I think I can delete the KeyFigure model and just implement a file upload endpoint that accepts the article id as a parameter, like @onejgordon was describing.

onejgordon commented 5 years ago

I was actually imagining keeping the KeyFigure model as a child of Article. If you flatten this, does that mean you'll be storing multiple URLs per figure (e.g. thumbnail + full) as one or multiple arrays within the Article model? Or will you be storing full blob data in the data model?

For deletion, will we still have a separate endpoint?

alexkyllo commented 5 years ago

Actually what I'll probably do is keep KeyFigure but with only these fields:

Then I'll need to implement a view controller that uses MultiPartParser to accept the image file uploads.

So once an article is created/saved, on the Edit Article page, you can present the Figures/Tables "+ Add" button, which will be an <input type="file" ...> , and once the user selects their file, you'll fire an AJAX form submission to this file upload endpoint. Does that sound right?

onejgordon commented 5 years ago

That works. How do you plan to handle thumbnails?

I've used Google Cloud Storage to upload and serve images, and I believe it has a convenient way to serve thumbnails as well. Might this help in writing the put/serve handlers? I've never used Django's ImageField.

https://cloud.google.com/python/getting-started/using-cloud-storage#serving_images_from_cloud_storage

alexkyllo commented 5 years ago

For thumbnails I think I will need to write some code that uses PIL/Pillow to generate and save the thumbnail at the time you upload it, like this: https://stackoverflow.com/questions/23922289/django-pil-save-thumbnail-version-right-when-image-is-uploaded

alexkyllo commented 5 years ago

Ok, image upload and thumbnails are working in my feature branch: https://github.com/ScienceCommons/curate_science/commits/redo

If you PUT form-data to http://localhost:8000/api/articles//key_figures/upload/ with an image in a form field named "file" then it will be saved along with a thumbnail (max dims 75x75, this is configurable). It validates that the file is an image; accepted types are JPG/JPEG, GIF, PNG.

Then you can access the image at: http://localhost:8000/media/key_figures/filename.jpg http://localhost:8000/media/key_figures/filename_thumb.jpg

@onejgordon how do you want to approach integration, since there are breaking changes to the API? Do you want me to just push my code to staging and let the UI break? Or do you prefer to merge and integrate locally on a feature branch, and push to staging when you're done?

onejgordon commented 5 years ago

Great, sounds good.

I'll do merges from /redo to my branch /new_shallow_uis and fix integration issues there before merging to staging for live tests.

alexkyllo commented 5 years ago

@eplebel should an author be able to edit articles that aren't already linked to their author profile? Should editing an article that isn't linked to your profile, cause it to automatically become linked?

If we block authors from editing an article that isn't linked to their profile, and make them link it first before editing, it seems like that would just create an extra step. Plus it would require a separate API endpoint for linking articles in addition to the existing one for editing articles (which supports linking them to authors).

eplebel commented 5 years ago

should an author be able to edit articles that aren't already linked to their author profile? Should editing an article that isn't linked to your profile, cause it to automatically become linked?

this isn't relevant for the initial release because there won't be any DB-served articles on the homepage for an author to try and edit or link (homepage will just display raw HTML of static featured articles).

but once there are, it will be relevant. and i'd say no: authors should not be able to edit articles that aren't already linked to their author profile. this gives them too much power given we aren't yet tracking user contributions/history (only admins will be able to edit articles not-yet-linked to an author profile). but once DB-served articles are being displaying on the homepage, authors could link an article there to their author page (by using the same functionality that is currently being implemented to link a pre-existing article to an author page for #12 ).

in the future, however, once we are tracking user contributions/history, then authors would be able to edit articles not-yet on their author page, given that even non-published non-author users will be able to edit specific article fields (which is also why wouldn't want to automatically link articles that a user edits).

alexkyllo commented 5 years ago

Ok, then I guess I won't restrict authors from editing articles at the API endpoint, that will be controlled via the UI for now. This will keep it less complex since linking an article to your author page is itself an "edit article" operation.

onejgordon commented 5 years ago

@alexkyllo will we be updating the ArticleSerializerNested to keep author_list as a simple charfield?

onejgordon commented 5 years ago

Remaining todo (note to self):

alexkyllo commented 5 years ago

@onejgordon authors and author_list are two different fields. Authors is the authors that have linked the article to their author pages; author_list is the char field listing all the authors who wrote the article. They're both provided at /api/articles/\<pk>/

onejgordon commented 5 years ago

Authors is the authors that have linked the article to their author pages; author_list is the char field listing all the authors who wrote the article.

Thanks for clarifying that.

eplebel commented 5 years ago
  • [ ] Examples of rendering of commentaries in article list items?

http://curatescience.org/author/etienne.lebel.html#10.1126/science.aac4716 untitled

onejgordon commented 5 years ago

@alexkyllo as per Etienne's comment above, commentaries appear immediately in the list view, so it may be necessary to add these to the ArticleListSerializer. Key figures can remain in the detail only for now, since we can wait to do that API call until the 'more' toggle arrow is clicked, and the article editor already uses the detail endpoint.

alexkyllo commented 5 years ago

@onejgordon ok, this is done in staging, deploying now.

eplebel commented 5 years ago

3 most major issues for editing articles on staging (https://curatescience-staging.appspot.com/app/author/etienne-lebel):

onejgordon commented 5 years ago

Thanks for pointing out the replication UIs still missing, will add these to the list.

cannot save an article when some fields have input text greater than about 150 characters

These are set server side currently (looks like 255-char limit currently). @alexkyllo can these be raised safely? 5000 seems a bit extreme for things like title, but I'm also not aware of the benefits/drawbacks of large max lengths in postgres.

latency

I noticed this too, though it seems faster than the previous iteration. @alexkyllo do you think API calls will return faster once we move to prod?

eplebel commented 5 years ago

These are set server side currently (looks like 255-char limit currently). @alexkyllo can these be raised safely? 5000 seems a bit extreme for things like title, but I'm also not aware of the benefits/drawbacks of large max lengths in postgres.

ah ok. ya for most fields this will be overkill, but for a few fields (i.e., author_contributions and funding_sources), these definitely need to be at least 4,000 characters to cover edge cases where an article involves a mega-collaboration involving hundreds of authors.

@alexkyllo if lower limits maximize DB performance, i can go through and provide upper-bound limits for each field, just let me know.

alexkyllo commented 5 years ago

Yes, it's fine to increase the size of char fields, it just takes up more space in the database, but we have a small amount of data so I'm not worried about it.

I'll increase the following to max_length=4000:

alexkyllo commented 5 years ago

I changed author_contributions, competing_interests, and funding_sources from CharField(max_length=255) to TextField which can have practically unlimited length (I think actually 65535 bytes) in postgres. Let me know if any others

eplebel commented 5 years ago

OK thanks. I think we're good for now.

I was going to suggest changing article_title to a TextField (to be safe), but just realized that (rare) super-long titles may look pretty ugly in article lists. So perhaps better to just keep as it is now ( 256 character-limit; though if a user attempts to save a title longer than this, i guess we should display an error message to this effect). we can expand the limit later when we implement title trimming for these super-long edge case titles.

eplebel commented 5 years ago

By the way, I was just checking the new figure uploading (from a local folder), and it's working, very exciting!

That said, it appears the UI has not been implemented as instructed in the original specs (https://github.com/ScienceCommons/curate_science/issues/10#issue-407523585) and as re-iterated in a Feb 12 comment (https://github.com/ScienceCommons/curate_science/issues/10#issuecomment-462585590).

clicking "Add figure" button will open a "browse for file" window and let users select the figures/table images (multiple selection allowed; a URL can also be inputted as is standard for this type of dialog window). the image(s) will then be uploaded, compressed (i've been using https://tinypng.com/), and thumbnailed ("Add figure" button remaining to the right of thumbnails as is currently implemented). the image should grow vertically within a 3-column div (given there will be < 6 figures/tables 90% of the time).

This was decided for the crucial reason that it's way faster and simpler for the user (only ONE click away to select figure files rather than TWO clicks to select files and then an extra "Add Figure" click). and again, linking to a pre-existing figure URL will happen <1% of the time (and even in that case, can be sneakily done via the browse for file dialog window; as already mentioned).

onejgordon commented 5 years ago

URL can also be inputted as is standard for this type of dialog window

@eplebel, I was confused by this, so wasn't sure what to implement. What is the sneaky method you're mentioning? I worry it may be OS specific, as I haven't seen anything like this on the standard file input in Chrome on MacOS. If URL input is very rare (<1%) I might suggest not including it at this phase, as I believe @alexkyllo will need to implement a separate server-side download from the specified URL to support this case.

eplebel commented 5 years ago

ok let's just drop URL input for now (the sneaky method is just inputting a URL into the open dialog window, which works in windows, but it appears this may not be supported on MacOS).