engagingnewsproject / misinfo-dashboard

2 stars 0 forks source link

Import Reports from Leonard #58

Open luukee opened 2 weeks ago

luukee commented 2 weeks ago

Leonard has prepared Google Sheets of newsroom reports we need to import into our production Firestore database.

Leonard and Luke's email feed:

Leonard:

Greetings, Luke. I hope this email finds you well!

I have attached a spreadsheet showing how I am transferring data from Marquee (raw data from the scraping firm) to the firebase here. The R code I am using is linked here.

The way the data will look for the firebase is in the tab labeled "Mock Firebase Columns," which will have the same labels in the same order as you do in the firebase sheet.

The only data that will show up in "Mock Firebase Columns" is data that the machine learning algorithm classifies as true.

The tabs show an example of how two websites would show up in the firebase as they go through the classification process.

Please let me know if this is what you were hoping to see. If anything is off, please let me know and I would be more than happy to get on it. I appreciate you!

Luke:

Hi Leonard, all is well here, thank you. Hope the same for you!

Thank you for sending over the info.

I forget, but did you send over an example of the scraped data? I would like to test your script against the data we will get from the firm. On that note, I'm sure you have tested, if so please send over the exported .csv for me to review.

Let me know, thanks!

Leonard:

I appreciate your email, Luke! I am sorry this is a bit long, but I hope this addresses your questions and concerns.

I have only tested on the mock example I created in the Google Sheet I linked earlier because we do not have the code to classify text as true or false yet. That Google Sheet uses the R code I linked to get from one tab to the next with the exception of the Mock Classifier Columns (before classification) tab to the Mock Classifier Columns (after classification) tab because there is no classification code yet. In that sense, the Mock Firebase Columns tab is what the exported csv. should look like if we start with columns that mirror what we get from the scraping firm because those tabs are the csv. after I ran the R code and copied and pasted the results from the new csv. into the corresponding Google Sheet tab.

I have attached an example of a scraped data csv. to this email. If you would like to test it, I would recommend only taking a sample of the data because the classification needs to be conducted manually as of now given the lack of classification code news_2024-08-26 1.csv The columns in this data will mirror the data structure in the Mock Scrape Columns tab of the Google Sheet I linked. The first part of the script should get you to a sheet that looks like the Mock Classifier Columns (before classification) tab. I do not have the classification code yet, so there is no code that will get you from the Mock Classifier Columns (before classification) tab to the Mock Classifier Columns (after classification) tab, meaning you may need to enter some TRUE and FALSE responses into the CRITERIA_RATED_BY_ME column and the states columns (e.g., Arizona, Nevada, Wisconsin) manually. These TRUE/FALSE values can be entered at random. The second part of the code should take you from the Mock Classifier Columns (after classification) tab to something that looks like the Mock Firebase Columns tab. I plan to check/run the entire csv. through my updated code once we get the classification code, and I will send you the results of that as soon as I have them.

Please let me know if you have any questions, comments, or concerns. Many thanks!

Luke:

This is great Leonard. Thank you for outlining everything.

I thought it would be helpful to run an export of the live site's reports collection (only "Test Agency" reports) to .csv for you to look at. I created a new tab on your Scrape to Firebase Data Transfer Gsheet named "Firebase Export EXAMPLE" with the raw export data if you want to have a look.

For the export I used the Firebase Admin SDK and Node.js to:

Export the data as a .json file

Convert the .json file into a .csv

Let me know if you have any questions. Thanks!

Leonard:

Sounds great! I have one question.

Does the order of the columns matter? I tried to match the column names and order to what I saw here, but the ordering of the columns appears to be different in the updated sheet. Please let me know if you only want the column names to be the same or if you want them to be named the same and in a particular order.

Please let me know if you have any further questions, comments, or concerns. Many thanks!

Luke:

Great question! The order does not matter. I left some comments on the Mock Firebase Columns Sheet.

Let me know if you have any questions, thanks!

Leonard:

Thank you for the comments on the document, Luke. They were very helpful!

Here is the updated R code and the updated sheet in tab - "(2.0) Mock Firebase Columns." There will be articles that speak to multiple states, so I have split the rows so that each row only has one state. I assume the state column is being used to identity which news station(s) an article should be sent to, so hopefully this fix will allow the data to be compatible with firebase while also allowing news stations to get the articles relevant to them.

I have removed the input from the columns that do not have any input, including agency. I thought that the base url may signal an agency, but clearly that will not always be the case.

Please let me know if there are some issues you see, and I would be more than happy to get on it. Thank you for all your help, and I wish you a wonderful day ahead!

Luke:

Hi Leonard, thank you for asking! Yes, you can just create a new sheet on the Google sheet you mentioned. That will work perfectly.

Let me know if you have any other questions, thanks for working on this!

Leonard:

Got it. The 10/28 articles are pasted in the newly created 10/28 tab of that Google sheet. I will paste the 10/29 articles later when that is finished in a new 10/29 tab.

Please let me know if you have any questions, comments, or concerns. Many thanks!

Leonard:

The 10/29 articles are pasted in the 10/29 tab.

I don't know who I should be emailing about this, so please let me know if you do not need these notifications moving forward.

Please let me know if you have any other questions, comments, or concerns. Many thanks!

Leonard:

The 10/30 articles are pasted in the 10/30 tab.

Please let me know if you have any other questions, comments, or concerns. Again, please let me know if you do not need these notifications moving forward. Many thanks!

Leonard:

The 10/31 articles are pasted in the 10/31 tab.

Please let me know if you have any other questions, comments, or concerns. Many thanks!

Leonard:

The 11/01 articles are pasted in the 11/01 tab.

I have added an extra article to the set of five Arizona articles because the initial five were from the same source. The extra article was not in the top six with regard to the proportion of Arizona-related terms, but was the first article not from the source the other five were from. You can see more on this here in the FULLPERCENTAGE_11_01 tab.

Please let me know if you have any questions, comments, or concerns. Many thanks!

luukee commented 2 weeks ago

@EthanL06 I started the import feature (button like your Export btn and handleCSVImport function) in the ReportsSection.jsx file.

It works locally on the emulator with this .csv import, but on the dev site it was not adding the agency for each report.

.csv to test on Emulator (this sheet has already been imported into the production db) FINAL PERCENTAGES - TOP5_10_PERCENTAGE_10_28.csv

Let me know what you find....