anthonymoser / better_data_portal

a streamlit app for searching socrata data portals
GNU General Public License v3.0
4 stars 1 forks source link

Efficiency suggestion: Only search datasets #2

Open levyj opened 4 years ago

levyj commented 4 years ago

I may be misunderstanding what you are doing, either technically or intention. However, you may be unnecessarily searching derived views (such as filtered views) that will never have records not also in the parent dataset.

For example, if you want to find my salary record (might as well; I think everyone else has!), you do not need search all of the below views. Anything appearing in the last five will, by definition, appear in the first one. In effect, you can search just https://data.cityofchicago.org/browse?limitTo=datasets.

That said, you may realize this and intentionally be highlighting the derived views, as well, so people will know about them. If so, fair enough.

image

anthonymoser commented 4 years ago

Thanks for these suggestions, I know it's taken me a while to get around to them. Do you know if there is a way to filter out derived views based on the metadata? I'm using the Sodapy library and the method for getting datasets doesn't really offer a good way to request only primary data sets. I'm filtering out the maps based on metadata, so I could also remove the extra sets, if they're classified as such.

In full disclosure I also had not realized the Chicago data portal was structured in that way. Is that a feature of Socrata's platform, or is that an implementation choice? I've noticed that other cities and entities using the Socrata platform seem to have a much more haphazard approach to the creation of data sets, which is a credit to Chicago's effort.

levyj commented 4 years ago

Kind of in reverse order:

Thanks!

If I am understanding the question right, this is a Socrata feature. From https://data.cityofchicago.org/browse, for example, it is basically the difference under View Types between Datasets and Filtered Views.

I actually have not used Sodapy (or even Python, as much as I should learn to do). Looking at the documentation briefly, I thought I might have an answer but now am not sure because I am not sure what API it is using behind the scenes. What I think I have figured out: