Add feature to get historical data for a particular location

Milind220 commented 2 years ago

Currently only live data can be fetched given a location. Most users would find more utility in large datasets of historical data.

I don't think that the WAQI api is capable of providing historical data, but the WAQI website does have a resource for downloading CSV's and Excel sheets of historical data - Perhaps it could be possible to download and read the CSV from there, programmatically.

Milind220 commented 2 years ago

I think there may be a way to do this by using Requests to fill up the form on this page, and then somehow programmatically clicking the download button to get the CSV locally onto the user's computer, into the same directory as their project. From there it could be read and imported into a dataframe to get all the data they need, or left as a CSV for them to do that themselves

Samxx97 commented 2 years ago

Hello this seems interesting can I work on it?

Milind220 commented 2 years ago

@Sam-damn yeah sure! That'd be awesome💯

This is a pretty major feature to add, so I think a new feature branch is probably a good idea for it.

Milind220 commented 2 years ago

@Sam-damn I've created a new branch - hist-data for this feature. Make your commits there, and when the feature's ready we'll merge it into dev. If you're curious about the branching model we follow for Ozone, you can check out the discussion about it.

Good luck!

Samxx97 commented 2 years ago

alrighty I’ll checkout the branching model so I can familiarize myself with the process and begin! Thanks for the info

Samxx97 commented 2 years ago

so i have been researching how this can be accomplished using the requests library , in order to submit a form and get the history data file u would have to emulate what requests your own browser is sending upon pressing the submit button , so i inspected the network tab in the developer tools in my browser and saw the POST request my browser sends upon clicking submit button and i tried to emulate it exactly using requests but problem is all it sends me back as a response is an IP and status , where as using a browser it generates a downloadable file as save as dialog, do u have any idea if i should be performing GET request on this IP that is returned back to me (i'am not sure if that is even possible since GET requests are usually performed on a URL)? overall i feel like this task could be accomplished using headless browser tool such as selenium but selenium requires other dependencies that cannot be listed as python packages, what are your thoughts on this?

Milind220 commented 2 years ago

@Sam-damn I've used selenium before, and I'm not opposed to it being used for this feature.

What do you mean by 'other dependencies that cannot be listed as python packages' ?

Meanwhile, I'll do some research into whether Requests can actually be used for this at all.

Samxx97 commented 2 years ago

@Milind220 one of selenium dependencies is a web driver interface which is usually a binary which needs to be installed manually and so it cannot be listed as a pip package, nevertheless I think there’s a python package that helps with this. If u find anything about wether requests can be used for this do let me know 😁.

Milind220 commented 2 years ago

@Sam-damn After doing some research I'm confident that requests could be used for this. Here are some links to YouTube videos that do similar things (you can refer to them if you need to)

To fill up the form to access the downloads, this video of logging into websites using Requests would help. It's a similar task that we need: https://www.youtube.com/watch?v=bM50i7sKwwM

To download the files: https://www.youtube.com/watch?v=UMuO2_BVFwY

Lemme know what you find when you try it out!

Also did you have any luck with that other python package?

Samxx97 commented 2 years ago

I have made a lot of progress using selenium, however much like in requests , I got stuck at the same stage where a save as dialog appears “you have chosen to save this file” this dialog is an operating system window and since it’s not an element within the browser it hence it cannot be accessed using selenium , I have to tried to bypass this by changing the settings of the web driver profile to suppress this dialog box and to allow for an automatic save to a custom location but this doesn’t work for some reason.

Ps : this dialog box seems to be a common problem as I observed from many stack overflow questions

Samxx97 commented 2 years ago

As for using requests library , this same issue becomes even harder to solve because in order to download a file using requests u would need a URL to perform a GET Request on ,which we don’t have , and unfortunately the links u sent me do not Tackle this issue , nevertheless I will keep trying using selenium And keep you updated.

And If downloading the actual csv file doesn’t work , as a last resort we can simply web-scrape the data from The Table element (which appears after filling the search bar and before submitting the form) and then simply construct a csv file from that data and then pass it to pandas or do whatever we want with it , however I’m not sure whether that data table is complete or not. I would love your input on this 😁.

Milind220 commented 2 years ago

@Sam-damn Ahhhh I know what you mean by the dialog box:

Milind220 commented 2 years ago

@Sam-damn Now that you mention it, I think webscraping the table element is genius! It appears to be the easiest solution to this problem. I checked it out for a few locations, and the table is 100% complete for all the parameters.

Great thinking man!

Let's try this:

Requests to fill up the form, if possible. That way we don't have to worry about adding Selenium WebDriver as a dependency.
Web-scrape the table. I've got some experience with this.
create a pandas dataframe with scraped data. Then we can use the _format_data method to get it into whatever format the user desires.

Milind220 commented 2 years ago

@Sam-damn Actually, if you manage to webscrape the table with Selenium, that's fine too. I suppose we can ask users to download the WebDriver on their own, or perhaps setup a shell script to download it separately (idk if that's possible, but just an idea)

EDIT: I found this package which could help us out with the WebDriver part. It downloads the WebDriver on the spot, which would allow us to add selenium as a regular dependency.

Samxx97 commented 2 years ago

Alright then i will focus on the scraping then and i will keep you updated, also what a coincidence i actually came across that package two days ago and been using it , its quite handy!

Milind220 commented 2 years ago

@Sam-damn hahaha that's great. Let me know how it goes!

Milind220 commented 2 years ago

@Sam-damn Any progress with that?

This is a pretty exciting feature for us to add - it would create a lot of opportunity for expansion and usage of the package. Historical data is very important for researchers, and this would make it really simple for them to get data. I'm hoping to get some professors from my university to use it if we can get this to work!

Samxx97 commented 2 years ago

@Milind220 It’s almost finished! I got it working nicely now , and I tested it a lot , hopefully it will work for everyone , currently Iam just organizing the file and making it more readable and adding the docs (the methods docs and class )and stuff and the packages etc... and yes indeed it was quite challenging to get it to work and quite fun.

Also I apologize for the delay , we have a pretty bad electricity situation here, so I have been working on it whenever I could 👍🏻👍🏻

Milind220 commented 2 years ago

@Sam-damn No problem at all! Your work has been top notch :)

Milind220 commented 2 years ago

@Sam-damn Hey, check your email!

Samxx97 commented 2 years ago

@Milind220 i sent a reply 😁

Ozon3Org / Ozon3

Add feature to get historical data for a particular location #3