bmrb-io / BMRB-API

BMRB API server and client implementations.
GNU General Public License v3.0
9 stars 2 forks source link

Add endpoint highlighting available data for partner databases #40

Closed jonwedell closed 6 years ago

jonwedell commented 6 years ago

Add a new endpoint that, for a given PDB, provides related BMRB entries and what data types they have available, where the data is located, and potentially a small image representing the data.

Related entry example: https://www.ebi.ac.uk/pdbe/entry/pdb/2joq Widget example: https://www.ebi.ac.uk/pdbe/entry/pdb/4k6a

@agutmanas @mvaradi

jonwedell commented 6 years ago

Beta version implemented in abf7fd0 and preceding commits.

See example here: http://webapi.bmrb.wisc.edu/devel/search/get_bmrb_data_from_pdb_id/2kpn?prettyprint=true

jonwedell commented 6 years ago

@agutmanas @mvaradi - I'm happy to change to data layout or add additional information based on your feedback.

If you only want to link to joint OneDep entries (as opposed to BLAST-matched or author-provided BMRB-PDB links), make sure to only link to entries whose 'match_types' field includes 'BMRB Entry Tracking System'.

The 'size' field only exists for time domain data, and it is the size of the data in bytes. I think the rest of the fields are pretty self-explanatory, but let me know if you have any questions.

mvaradi commented 6 years ago

Thank you for creating this endpoint so promptly :) I'll be comparing it to what we have from other resources, but at the first glance, it looks very sensible to me!

mvaradi commented 6 years ago

@jonwedell - I would like to ask three questions:

1.) What values can "data_type" have? Only "Spectral peak lists" and "Time domain data"? 2.) Is there an upper limit for the number of "data_sets" for "Spectral peak lists"? What would be an average number? - Depending on the answer I will have to look into different ways of rendering it 3.) Will there be a URL to a thumbnail? If so, what will it be the thumbnail of? What will be the name of the field, and where will it be in the structure?

Thanks, M.

dmaziuk commented 6 years ago

I'm guessing Jon will answer in a few hours when it's daytime in US, in the meantime take a look at http://www.bmrb.wisc.edu/dictionary/tagcat.php?sfcat=entry_information --

I don't believe there is a programmatic upper limit.

jonwedell commented 6 years ago

@mvaradi

  1. The full list of potential values is:
    T1rho_relaxation
    pKa_value_data_set
    representative_conformer
    pH_titration
    order_parameters
    dipole_dipole_cross_correlations
    heteronucl_NOEs
    other_data_list
    binding constants
    spectral_peak_list
    binding_constants
    heteronucl_T1rho_relaxation
    chemical_rates
    dipole_CSA_cross_correlation_relaxation
    RDCs
    pKa_values
    heteronucl_T2_relaxation
    H_exch_rates
    homonucl_NOEs
    coupling_constants
    H_exch_protection_factors
    heteronucl_T1_relaxation
    spectral_density_values
    conformer_family_coord_set
    pH_param_list
    pH_NMR_param_list

    In practice most of these are rare. You can see the full distribution of which entries have what data types here. Chemical shift data is excluded because that is our primary data type, all entries have it, and you already link to us there when relevant.

  2. There is no enforced maximum number, but in practice I don't see more than 9. Other data types could potentially have more than 9, but probably have similar distributions to the spectral peak lists. Here is the distribution (excluding entries with no peak lists at all):
    num_peaklists | count 
    ---------------+-------
             1 |    88
             2 |   127
             3 |   162
             4 |    59
             5 |    24
             6 |     8
             7 |     5
             8 |     3
             9 |    33
  3. I think we will be able to provide a thumbnail URL for the time domain data, but potentially not for the other data types. I'll look into this further and get back to you. The field will be named "thumbnail_url" and be located at the same level of the heirarchy as "data_type" here:
      {
        "thumbnail_url": "http://to.be.populated",
        "data_sets": 28, 
        "data_type": "Time domain data", 
        "size": 574023046, 
        "urls": [
          "ftp://ftp.bmrb.wisc.edu/pub/bmrb/timedomain/bmr16561/"
        ]
      }

    The thumbnail will most likely be a generic "time domain data" thumbnail rather than an entry-specific one.

Finally, the time domain data is the most important and the one we would want to highlight the most. So perhaps the widget would always show time domain data when present with a direct link, and then if other data types are present, show an "additional data types available" indicator with some sort of button to click to expand the list of additional data. Or perhaps list the additional data types available but not all the sets of each type unless clicked. Does that seem like a reasonable design to you?

Cheers, Jon

mvaradi commented 6 years ago

@jonwedell & @dmaziuk - Thank you both for your answers! I'm coding the component now, so I'll soon be able to show you a prototype...

Cheers, Mihaly

mvaradi commented 6 years ago

@jonwedell - Please find a prototype of the component here: https://wwwdev.ebi.ac.uk/pdbe/entry/pdb/2kpn Can you please comment if it looks sensible? Are there things you would prefer differently?

jonwedell commented 6 years ago

@mvaradi - Thanks for creating this so quickly! Unfortunately I get a 404 on that link. Is there an issue with the server?

mvaradi commented 6 years ago

Ops.. Indeed, the whole dev server is down for some reason; I hope they will bring it online soon. In the meantime, here is a screenshot:

raw-exp-data-with-bmrb
mvaradi commented 6 years ago

And here is how to full screen should look like:

raw-exp-data-with-bmrb-full

jonwedell commented 6 years ago

@mvaradi - I like the way it looks a lot! Thanks. A few minor comments:

  1. While there will often be more than one time domain "data set" there will only ever be one link to the actual time domain data (the link is to the root of a tree that contains all the sets). So perhaps the hyperlink should be "Time Domain Data" rather than "Set 1".
  2. The thumbnail URL is now present in the API for time domain data.
  3. Can you change "Available data types" to "Additional data available" and remove the hyperlink. I would also omit time domain data from this section as it is also linked. Then, would it be possible for the remaining data types to list their available data sets (as hyperlinks) upon clicking? If this is not possible or too complicated, the link to the entry page to get access to the additional data is okay (as it is now).
  4. Does the library you are using to make the API query support supplying custom headers in the HTTP request? If so, it would be great if you could include an 'Application' header with '[EBI|PDBe] summary page' or something so that we can more easily track these requests in our logging. More details here.

Thanks again, Jon

mvaradi commented 6 years ago

@jonwedell - Thank you for your suggestions, I'll make changes accordingly! 1. and 2. is straightforward, 3. I'll have to think about a bit more, and 4.) I'll have to check (this Polymer component uses iron-ajax to make the API call, I'll read up on including the suggested header.

Sincerely, Misi

mvaradi commented 6 years ago

@jonwedell - Hi again, I made changes based on your suggestions: Instead of "Set 1" the link says "Download" (1.); the correct thumbnail is displayed (2.); the list says "Additional data available" and displays the links to (as indices from 1); and I've added this to our iron-ajax: headers='{"Application": "[EBI][PDBe] raw-exp-data"}' - It does seem to work and send this header (4.)

Please find a screenshot here:

screen shot 2018-07-19 at 16 10 03

mvaradi commented 6 years ago

Minor fix to rendering the thumbnail:

screen shot 2018-07-19 at 16 24 46

mvaradi commented 6 years ago

This version is available on the dev server: https://wwwdev.ebi.ac.uk/pdbe/entry/pdb/2kpn (I hope no one kills it before you can look at it :) )

jonwedell commented 6 years ago

Thanks, the current versions looks great to me. I'll check with the rest of the staff here to see if any of them have comments but I think we're pretty much there.

perzeus commented 6 years ago

I agree. This looks good. Thanks all for your work on this. Pedro

Pedro R. Romero Director Biological Magnetic Resonance Data Bank Department of Biochemistry University of Wisconsin - Madison B1118a Biochemistry 420 Henry Mall Madison, WI 53706 email: promero@wisc.edumailto:promero@wisc.edu Voice: 608-265-5741 FAX: 608-262-3759

On Jul 20, 2018, at 10:43 AM, Jon Wedell notifications@github.com<mailto:notifications@github.com> wrote:

Thanks, the current versions looks great to me. I'll check with the rest of the staff here to see if any of them have comments but I think we're pretty much there.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/uwbmrb/BMRB-API/issues/40#issuecomment-406640065, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AKZSd4d6MLa2SouAYqbT2JbMryLFMcrRks5uIfqKgaJpZM4U4B6F.

mvaradi commented 6 years ago

Thank you for all the work and feedback! :) I'll schedule this for production in our release next week.

jonwedell commented 6 years ago

Awesome. One final and minor note, can you change the hyperlink from "Download" to "Access"? Thanks!

And please use http://webapi.bmrb.wisc.edu/v2/ rather than http://webapi.bmrb.wisc.edu/devel/ for the release code. I'll have the new endpoint available there by the end of the day.

Cheers, Jon

mvaradi commented 6 years ago

Thank you, will do! I'll also have to do a small fix for cases when data is [] - right now it displays "Additional data: " for such cases, which is wrong

jonwedell commented 6 years ago

right now it displays "Additional data: "

Ah, yes, good catch. One other thing I just noticed when trying some different PDB ID's is that we don't distinguish between the exact BMRB entry corresponding to the PDB entry and other matched entries. The endpoint is currently returning entries that are found during a BLAST search in addition to the exact corresponding entry. It's probably appropriate to show those, but we might want to use text or some sort of visual divider to distinguish the exact matched entry versus "similar entries". For example, see here. The exact match (if present) will always be the first result, but it would be good to highlight it.

The entry that is an exact match will have Exact in the match_types field.

dmaziuk commented 6 years ago

The endpoint is currently returning entries that are found during a BLAST search in addition to the exact corresponding entry. It's probably appropriate to show those ...

Actually I don't think it is in this case. We've been through this more times than I care to remember, and the bottom line is: if you want the BMRB entry for a PDB structure, it's the one from the deposition system only. BLAST matches are useless for this.

mvaradi commented 6 years ago

@dmaziuk - Yes, we are on the same page with this. Our intention is to have BLAST-based links displayed elsewhere, where we show data that is related to a PDB entry, but was not used in determining the structure.

It is a bug if the raw-exp-data component shows BLAST-based information as a check for this was supposed to be in place.

jonwedell commented 6 years ago

when data is []

I fixed this bug on my end - these shouldn't have been there in the first place. Entries are only returned now if they have at least one type of data. Also the /v2/ endpoint is live.

mvaradi commented 6 years ago

Thank you, Jon :) I'll do my fixes by Monday for the text and the Exact match

mvaradi commented 6 years ago

@jonwedell, @dmaziuk - Thank you again for all the feedback, and also for fixing the "data: []" problem!

I looked into it, and the widget was rendering for non-exact matches, because in "match_types" it was looking for "BMRB Entry Tracking System", which was the originally suggested term in this thread for making the filtering - I saw that now that's not the case anymore, so the component is looking for "Exact" instead, as advised.

I've also switched the API from "devel" to "v2", and renamed "Download" to "Access"

Please find an example for the updated component at: https://wwwdev.ebi.ac.uk/pdbe/entry/pdb/2kpn (This should render) https://wwwdev.ebi.ac.uk/pdbe/entry/pdb/2aqa (This should not render)

mvaradi commented 6 years ago

@jonwedell - I have one small request: Could you please change the thumbnail URL to HTTPS instead of HTTP? Some browsers may complain about "mixed content", for example Chrome issues this warning: "Mixed Content: The page at 'https://wwwdev.ebi.ac.uk/pdbe/entry/pdb/2kpn' was loaded over HTTPS, but requested an insecure image 'http://www.bmrb.wisc.edu/images/fid.svg'. This content should also be served over HTTPS."

Thanks, Misi

jonwedell commented 6 years ago

@dmaziuk - The image is currently hosted under /images/ on the main web server. If you're still opposed to enabling HTTPS I can serve the image from the API server itself.

@mvaradi - Ah, sorry for not making it more clear we had made that change. I recall encountering the non-exact match display issue before changing the match nomenclature, but it's quite possible I deployed the change before testing rather than after and am misremembering the chronology. I think the current version looks great! Making the thumbnail a hyperlink to the data is a good touch as well.

dmaziuk commented 6 years ago

Put it on the api server, tomorrow they'll start complaining it comes from a different "domain" or something anyway.

mvaradi commented 6 years ago

@jonwedell - Thank you, and no worries, I actually did have a bug in there regarding this too, but it should work as intended now.

For example this entry has an "Exact" and a BLAST only match as well, and shows only the "Exact", as intended: https://wwwdev.ebi.ac.uk/pdbe/entry/pdb/2m68

dmaziuk commented 6 years ago

The long explanation is that "exact" as well as author-provided "related" matches exist at the entry level whereas BLAST matches are sequence-based and thus exist at the entity level. On top of which there may be matches at the assembly, entity, and chem. comp. levels as well. And there may be multiple assemblies, entities, and chem. comps. in an entry, of course.

mvaradi commented 6 years ago

@dmaziuk - Thank you for the clarification!

jonwedell commented 6 years ago

@mvaradi - Thumbnail image location updated. It will now point to a HTTPS URL if the request to the API is made using HTTPS. Everything should be good to go. 👍

mvaradi commented 6 years ago

@jonwedell - Thank you, excellent! 👍 We are ready to move it into production on our end too - I'll stage the update tomorrow morning for release. We'll probably also have a news item about this update on our home page's news section, and on social media.

jonwedell commented 6 years ago

We'll probably also have a news item about this update on our home page's news section, and on social media.

That's excellent!

I see that the change is live. Thanks again!

Jon

mvaradi commented 6 years ago

Thanks to you too :) It has bee a good micro-project!