OmdenaAI / trieste-italy-long-covid

GNU General Public License v3.0
9 stars 0 forks source link

hdf dataframeis void (no data inside) #1

Closed lucapug closed 2 years ago

lucapug commented 2 years ago

when executing the LongCovidDownload.ipynb, need to install first wget and snscrape (added a cell at the beginning). I execute all the following cells without errors (just have to change the path to save the final csv file). The issue regards the hdf dataframe created at the end of the notebook. if I check the data inside by executing hdf.head() there are no rscreenshot-colab research google com-2022 07 07-23_04_16ws inside the dataframe..seems that tthere are no tweets scraped.

now-youre-gittin-it commented 2 years ago

Hi Luca, although I have not worked on this code yet I can try to guess as to what the issue is. I have faced issues in the past where the function is not performing the intended task even though the syntax is correct and base requirements are met. In this case, it seems hydrate_tweet() is not working, which is why the hdf dataframe may be empty. Is it confirmed that all requirements for hydrate_tweet() to work are met (e.g., if any additional context/credentials/parameters are needed)?

lucapug commented 2 years ago

Hi, seems that the problem depends on the development enviroment in which I run the code. In my first attempt I used Colab (Python 3.7.13 the default runtime) and probably TwitterTweetScraper() in the hydrate function is not correctly recognized. Today I tried to execute in local environment with Python 3.8.5 interpreter and this time the csv file was correctly created with the scraped tweets. So maybe can be considered closed the issue? maybe in the readme can be put some hints of the correct environment parameters (interpreter version..) ?

now-youre-gittin-it commented 2 years ago

Wonderful, its great that the code worked in a different environment! Its a good learning for us, about difference in results across environments. Definitely, we can document the exact environment used for the code so that we can run it successfully on other machines. Maybe we can do this at the end of the project so that we can have a uniform specification format for everyone's code.