GeneralMills / pytrends

Pseudo API for Google Trends
Other
3.23k stars 816 forks source link

Updating tests expected DataFrames is a chore #569

Open Terseus opened 1 year ago

Terseus commented 1 year ago

In #541 I introduced a test suite based on VCR.py cassettes, however over time I've learned that the Google Trends API returns different data for what should be already consolidated information (e.g. search terms for 2021), for example if we try to update the cassette of the test test_interest_over_time_ok right now we get this error:

E   AssertionError: DataFrame.iloc[:, 0] (column name="pizza") are different
E
E   DataFrame.iloc[:, 0] (column name="pizza") values are different (60.0 %)
E   [index]: [2021-01-01T00:00:00.000000000, 2021-01-02T00:00:00.000000000, 2021-01-03T00:00:00.000000000, 2021-01-04T00:00:00.000000000, 2021-01-05T00:00:00.000000000]
E   [left]:  [100, 81, 78, 48, 51]
E   [right]: [100, 84, 78, 50, 52]

My approach for this problem was to add documentation on how to replace these results in the repository's contributing guidelines, however while trying to fix #566 I found that I have to update all the cassettes one by one; it's a big, time-consuming chore that not many contributors may be (reasonably) willing to do.

Ideally we should have a system to automate the update of the expected DataFrames allowing the user to inspect the new result to see if it's valid or not (e.g. if a bad implementation produce all zeroes we should know before replacing all the expected DataFrames).

I propose a system to update the cassettes almost automatically by leveraging the management of the DataFrame responses in a pytest fixture:

There may be some details that I don't catch right now but that's the main idea.

Please @emlazzarin tell me if it goods look to you and I'll implement it.

emlazzarin commented 1 year ago

Yeah, this sounds right. I didn't realize the tests functioned that way either. Thanks for taking a stab at it.