Closed somerandomsequence closed 6 years ago
Hey Caleb. I checked out the link for this. Wow, that is super handy, I was doing stuff the hard way! I'm a little confused on a couple of the extensions though. Suppose I want to have my function return the "spectra" for sample 6880. According to the link above, I would use a urllib query for "https://api.hpc.nrel.gov/xrd/api/samples/6880/spectra" and it would have just the optical and xrd spectra for the sample. However when I do that, I get a 404 Error. Am I searching it correctly? It should have the same formatting as the "xrf" queries, which would be formatted as "https://api.hpc.nrel.gov/xrd/api/samples/6880/xrf" (this works just fine), but it doesn't appear to be working for either the /spectra or the /prop extensions.
Hi Marcus,
The format for these endpoints is a little nonstandard to support querying information for multiple libraries:
https://api.hpc.nrel.gov/xrd/api/samples/spectra?ids=6880
and, e.g., for multiple libraries all at once:
https://api.hpc.nrel.gov/xrd/api/samples/spectra?ids=6880,6881
The lib files are now fully functional, but since they were previously using the private API, there is some slight tweaking needed to tell these to query the spectra instead of looping over all positions in a sample. Also, in testing out the public API, I found several instances where a sample was found on the website that didn't appear to be available in the API. This seemed a little backwards, any idea why that might be happening? Thanks, Marcus
Can you provide examples of samples that were in the public API but not the private API (and steps to reproduce)?
@meschw04 - in the current code, I'm not sure why the result from this:
t.spectra(which='optical')
looks different from:
t.spectra(which='xrd')
I'd expect them to have a similar shape whereas the former is a single row containing lists and the latter is a proper data frame.
So here's the issue. With XRD, there are always 661 data points for wavelength, 661 for intensity, and 661 for background. This made it readily compatible to look like a classical pandas DataFrame. With optical, it's a different story. This is because near-infrared takes around 400 measurements while ultraviolet takes around 800 measurements. When I tried to put them all in the same pandas DataFrame, it complained that all of the columns wouldn't be the same size. My options were to either pad these shorter lists with nulls so that all the columns were the same length or to include them in a pandas DataFrame as a list under a single row. Is there a better way to do this? Thanks!
Hrm, okay, this seems reasonable. Thanks for clarifying.
I don't really have a strong preference between the 1-row-containing-lists and the padded data frame. From a user perspective it's a bit surprising to get a different output "shape" depending on the input parameters from a function. On the other hand, it's also a bit surprising to have padded/null data. I'd probably err towards the padding.
I decided to pad with nulls using pd.concat([],axis=1). I also adjusted basic importing notebook and optical notebook to reflect this change.
The current code relies heavily on querying individual samples properties. Instead we can make use of the more compact API functions.
For instance: