Closed Weyaaron closed 9 months ago
CI passes on master so i guess it's something on your side. Could you provide the entire pytest
report?
I am not quite sure if this is what you asked for, but I hope it helps anyway. I already took a look at possible python version differences, but I couldn't find any in https://docs.python.org/3/library/datetime.html
/home/aaron/.conda/envs/fundus/bin/python /opt/pycharm-professional/plugins/python/helpers/pycharm/_jb_pytest_runner.py --path /home/aaron/Code/Python/Fundus/tests
Testing started at 10:39 ...
Launching pytest with arguments /home/aaron/Code/Python/Fundus/tests --no-header --no-summary -q in /home/aaron/Code/Python/Fundus
============================= test session starts ==============================
collecting ... collected 116 items
tests/test_collection.py::TestCollection::test_iter_empty_collection
tests/test_collection.py::TestCollection::test_iter_collection_with_empty_publisher_enum
tests/test_collection.py::TestCollection::test_iter_collection_with_publisher_enum
tests/test_collection.py::TestCollection::test_publisher_enum_with_wrong_enum_value
tests/test_collection.py::TestCollection::test_publisher_spec_without_source
tests/test_collection.py::TestCollection::test_duplicate_publisher_names_in_same_collection
tests/test_collection.py::TestCollection::test_supports
tests/test_collection.py::TestCollection::test_search
tests/test_parser.py::test_supported_publishers_table PASSED [ 0%]PASSED [ 1%]PASSED [ 2%]PASSED [ 3%]PASSED [ 4%]PASSED [ 5%]PASSED [ 6%]PASSED [ 6%]
tests/test_parser.py::TestBaseParser::test_empty_parser
tests/test_parser.py::TestBaseParser::test_functions_iter
tests/test_parser.py::TestBaseParser::test_attributes_iter
tests/test_parser.py::TestBaseParser::test_supported_unsupported
tests/test_parser.py::TestParserProxy::test_empty_proxy
tests/test_parser.py::TestParserProxy::test_proxy_with_same_date
tests/test_parser.py::TestParserProxy::test_len
tests/test_parser.py::TestParserProxy::test_iter
tests/test_parser.py::TestParserProxy::test_latest
tests/test_parser.py::TestParserProxy::test_call
tests/test_parser.py::TestParserProxy::test_mapping
tests/test_parser.py::TestParser::test_annotations[DieWelt]
tests/test_parser.py::TestParser::test_annotations[MDR]
tests/test_parser.py::TestParser::test_annotations[FAZ]
tests/test_parser.py::TestParser::test_annotations[Focus]
tests/test_parser.py::TestParser::test_annotations[Merkur]
tests/test_parser.py::TestParser::test_annotations[SZ]
tests/test_parser.py::TestParser::test_annotations[SpiegelOnline]
tests/test_parser.py::TestParser::test_annotations[DieZeit]
tests/test_parser.py::TestParser::test_annotations[BerlinerZeitung]
tests/test_parser.py::TestParser::test_annotations[Tagesschau]
tests/test_parser.py::TestParser::test_annotations[DW]
tests/test_parser.py::TestParser::test_annotations[Stern]
tests/test_parser.py::TestParser::test_annotations[NTV]
tests/test_parser.py::TestParser::test_annotations[NDR]
tests/test_parser.py::TestParser::test_annotations[Taz]
tests/test_parser.py::TestParser::test_annotations[Bild]
tests/test_parser.py::TestParser::test_annotations[WAZ]
tests/test_parser.py::TestParser::test_annotations[ORF]
tests/test_parser.py::TestParser::test_annotations[APNews]
tests/test_parser.py::TestParser::test_annotations[CNBC]
tests/test_parser.py::TestParser::test_annotations[TheIntercept]
tests/test_parser.py::TestParser::test_annotations[TheGatewayPundit]
tests/test_parser.py::TestParser::test_annotations[FoxNews]
tests/test_parser.py::TestParser::test_annotations[TheNation]
tests/test_parser.py::TestParser::test_annotations[WorldTruth]
tests/test_parser.py::TestParser::test_annotations[FreeBeacon]
tests/test_parser.py::TestParser::test_annotations[WashingtonTimes]
tests/test_parser.py::TestParser::test_annotations[TheNewYorker]
tests/test_parser.py::TestParser::test_annotations[Reuters]
tests/test_parser.py::TestParser::test_annotations[OccupyDemocrats]
tests/test_parser.py::TestParser::test_annotations[TheGuardian]
tests/test_parser.py::TestParser::test_parsing[DieWelt]
tests/test_parser.py::TestParser::test_parsing[MDR]
tests/test_parser.py::TestParser::test_parsing[FAZ]
tests/test_parser.py::TestParser::test_parsing[Focus]
tests/test_parser.py::TestParser::test_parsing[Merkur]
tests/test_parser.py::TestParser::test_parsing[SZ]
tests/test_parser.py::TestParser::test_parsing[SpiegelOnline]
tests/test_parser.py::TestParser::test_parsing[DieZeit]
tests/test_parser.py::TestParser::test_parsing[BerlinerZeitung]
tests/test_parser.py::TestParser::test_parsing[Tagesschau]
tests/test_parser.py::TestParser::test_parsing[DW]
tests/test_parser.py::TestParser::test_parsing[Stern]
tests/test_parser.py::TestParser::test_parsing[NTV]
tests/test_parser.py::TestParser::test_parsing[NDR]
tests/test_parser.py::TestParser::test_parsing[Taz]
tests/test_parser.py::TestParser::test_parsing[Bild]
tests/test_parser.py::TestParser::test_parsing[WAZ]
tests/test_parser.py::TestParser::test_parsing[ORF]
tests/test_parser.py::TestParser::test_parsing[APNews]
tests/test_parser.py::TestParser::test_parsing[CNBC]
tests/test_parser.py::TestParser::test_parsing[TheIntercept]
tests/test_parser.py::TestParser::test_parsing[TheGatewayPundit]
tests/test_parser.py::TestParser::test_parsing[FoxNews]
tests/test_parser.py::TestParser::test_parsing[TheNation]
tests/test_parser.py::TestParser::test_parsing[WorldTruth]
tests/test_parser.py::TestParser::test_parsing[FreeBeacon]
tests/test_parser.py::TestParser::test_parsing[WashingtonTimes]
tests/test_parser.py::TestParser::test_parsing[TheNewYorker]
tests/test_parser.py::TestParser::test_parsing[Reuters]
tests/test_parser.py::TestParser::test_parsing[OccupyDemocrats]
tests/test_parser.py::TestParser::test_parsing[TheGuardian]
tests/test_parser.py::TestParser::test_reserved_attribute_names[DieWelt]
tests/test_parser.py::TestParser::test_reserved_attribute_names[MDR]
tests/test_parser.py::TestParser::test_reserved_attribute_names[FAZ]
tests/test_parser.py::TestParser::test_reserved_attribute_names[Focus]
tests/test_parser.py::TestParser::test_reserved_attribute_names[Merkur]
tests/test_parser.py::TestParser::test_reserved_attribute_names[SZ]
tests/test_parser.py::TestParser::test_reserved_attribute_names[SpiegelOnline]
tests/test_parser.py::TestParser::test_reserved_attribute_names[DieZeit]
tests/test_parser.py::TestParser::test_reserved_attribute_names[BerlinerZeitung]
tests/test_parser.py::TestParser::test_reserved_attribute_names[Tagesschau]
tests/test_parser.py::TestParser::test_reserved_attribute_names[DW]
tests/test_parser.py::TestParser::test_reserved_attribute_names[Stern]
tests/test_parser.py::TestParser::test_reserved_attribute_names[NTV]
tests/test_parser.py::TestParser::test_reserved_attribute_names[NDR]
tests/test_parser.py::TestParser::test_reserved_attribute_names[Taz]
tests/test_parser.py::TestParser::test_reserved_attribute_names[Bild]
tests/test_parser.py::TestParser::test_reserved_attribute_names[WAZ]
tests/test_parser.py::TestParser::test_reserved_attribute_names[ORF]
tests/test_parser.py::TestParser::test_reserved_attribute_names[APNews]
tests/test_parser.py::TestParser::test_reserved_attribute_names[CNBC]
tests/test_parser.py::TestParser::test_reserved_attribute_names[TheIntercept]
tests/test_parser.py::TestParser::test_reserved_attribute_names[TheGatewayPundit]
tests/test_parser.py::TestParser::test_reserved_attribute_names[FoxNews]
tests/test_parser.py::TestParser::test_reserved_attribute_names[TheNation]
tests/test_parser.py::TestParser::test_reserved_attribute_names[WorldTruth]
tests/test_parser.py::TestParser::test_reserved_attribute_names[FreeBeacon]
tests/test_parser.py::TestParser::test_reserved_attribute_names[WashingtonTimes]
tests/test_parser.py::TestParser::test_reserved_attribute_names[TheNewYorker]
tests/test_parser.py::TestParser::test_reserved_attribute_names[Reuters]
tests/test_parser.py::TestParser::test_reserved_attribute_names[OccupyDemocrats]
tests/test_parser.py::TestParser::test_reserved_attribute_names[TheGuardian]
tests/test_parser.py::TestUtility::test_generic_author_parsing
tests/test_pipeline.py::TestPipeline::test_crawler_with_empty_collection PASSED [ 7%]PASSED [ 8%]PASSED [ 9%]PASSED [ 10%]PASSED [ 11%]PASSED [ 12%]PASSED [ 12%]PASSED [ 13%]PASSED [ 14%]PASSED [ 15%]Later
PASSED [ 16%]PASSED [ 17%]PASSED [ 18%]PASSED [ 18%]PASSED [ 19%]PASSED [ 20%]PASSED [ 21%]PASSED [ 22%]PASSED [ 23%]PASSED [ 24%]PASSED [ 25%]PASSED [ 25%]PASSED [ 26%]PASSED [ 27%]PASSED [ 28%]PASSED [ 29%]PASSED [ 30%]PASSED [ 31%]PASSED [ 31%]PASSED [ 32%]PASSED [ 33%]PASSED [ 34%]PASSED [ 35%]PASSED [ 36%]PASSED [ 37%]PASSED [ 37%]PASSED [ 38%]PASSED [ 39%]PASSED [ 40%]PASSED [ 41%]PASSED [ 42%]PASSED [ 43%]PASSED [ 43%]PASSED [ 44%]PASSED [ 45%]FAILED [ 46%]
tests/test_parser.py:142 (TestParser.test_parsing[FAZ])
datetime.datetime(2023, 4, 28, 19, 20, 36) != datetime.datetime(2023, 4, 28, 19, 20, 36, tzinfo=tzlocal())
Expected :datetime.datetime(2023, 4, 28, 19, 20, 36, tzinfo=tzlocal())
Actual :datetime.datetime(2023, 4, 28, 19, 20, 36)
<Click to see difference>
self = <tests.test_parser.TestParser object at 0x7ff17d70d280>
publisher = <DE.FAZ: 3>
def test_parsing(self, publisher: PublisherEnum) -> None:
# enforce test coverage
attrs_required_to_cover = {"title", "authors", "topics", "publishing_date"}
comparative_data = load_test_case_data(publisher)
html_mapping = load_html_test_file_mapping(publisher)
for versioned_parser in publisher.parser:
# validate json
version_name = versioned_parser.__name__
assert (
version_data := comparative_data.get(version_name)
), f"Missing test data for parser version '{version_name}'"
for key, value in version_data.items():
if not value:
raise ValueError(
f"There is no value set for key '{key}' in the test JSON. "
f"Only complete articles should be used as test cases"
)
# test coverage
supported_attrs = set(versioned_parser.attributes().names)
missing_attrs = attrs_required_to_cover & supported_attrs - set(version_data.keys())
assert not missing_attrs, f"Test JSON does not cover the following attribute(s): {missing_attrs}"
assert (html := html_mapping.get(versioned_parser)), f"Missing test HTML for parser version {version_name}"
# compare data
extraction = versioned_parser().parse(html.content, "raise")
for key, value in version_data.items():
> assert value == extraction[key]
E assert datetime.datetime(2023, 4, 28, 19, 20, 36) == datetime.datetime(2023, 4, 28, 19, 20, 36, tzinfo=tzlocal())
tests/test_parser.py:173: AssertionError
PASSED [ 47%]PASSED [ 48%]PASSED [ 49%]PASSED [ 50%]PASSED [ 50%]PASSED [ 51%]PASSED [ 52%]PASSED [ 53%]PASSED [ 54%]PASSED [ 55%]PASSED [ 56%]PASSED [ 56%]PASSED [ 57%]PASSED [ 58%]PASSED [ 59%]PASSED [ 60%]PASSED [ 61%]PASSED [ 62%]PASSED [ 62%]PASSED [ 63%]PASSED [ 64%]PASSED [ 65%]PASSED [ 66%]PASSED [ 67%]PASSED [ 68%]PASSED [ 68%]PASSED [ 69%]PASSED [ 70%]PASSED [ 71%]PASSED [ 72%]PASSED [ 73%]PASSED [ 74%]PASSED [ 75%]PASSED [ 75%]PASSED [ 76%]PASSED [ 77%]PASSED [ 78%]PASSED [ 79%]PASSED [ 80%]PASSED [ 81%]PASSED [ 81%]PASSED [ 82%]PASSED [ 83%]PASSED [ 84%]PASSED [ 85%]PASSED [ 86%]PASSED [ 87%]PASSED [ 87%]PASSED [ 88%]PASSED [ 89%]PASSED [ 90%]PASSED [ 91%]PASSED [ 92%]PASSED [ 93%]PASSED [ 93%]PASSED [ 94%]PASSED [ 95%]PASSED [ 96%]PASSED [ 97%]PASSED [ 98%]
tests/test_pipeline.py::TestPipeline::test_crawler_with_collection
======================== 1 failed, 115 passed in 0.60s =========================
PASSED [ 99%]PASSED [100%]
Process finished with exit code 1
Thanks for clarifying the title, I will try to provide additional information/provide reproducibility.
I can confirm that this happens on a clean install and python3.8 as well as 3.9. I used the following versions:
(fundus_3_8) aaron:Fundus/ (master✗) $ pip freeze [11:54:23]
aiohttp==3.8.4
aiosignal==1.3.1
async-timeout==4.0.2
attrs==23.1.0
certifi==2023.5.7
charset-normalizer==3.1.0
click==8.1.3
colorama==0.4.6
cssselect==1.1.0
decorator==5.1.1
dill==0.3.5.1
dotmap==1.3.30
exceptiongroup==1.1.2
FastWARC==0.12.2
feedparser==6.0.10
frozenlist==1.3.3
ftfy==6.1.1
-e git+ssh://git@github.com/flairNLP/fundus.git@92bef68d23f781d9ae4bba824927c9a06414b946#egg=fundus
idna==3.4
iniconfig==2.0.0
langdetect==1.0.9
lxml==4.9.2
more-itertools==9.1.0
multidict==6.0.4
packaging==23.1
pluggy==1.2.0
pytest==7.4.0
python-dateutil==2.8.2
requests==2.28.2
sgmllib3k==1.0.0
six==1.16.0
tomli==2.0.1
tqdm==4.65.0
typing_extensions==4.7.1
urllib3==1.26.16
validators==0.20.0
wcwidth==0.2.6
yarl==1.9.2
I had a go at this, and it is weird. The problem has been narrowed down a lot. The article does provide timezone-data, but our JSON does not. These dates are parsed using two different methods. Maybe we should change this?
Anyway, the main problem is: This test should fail everywhere, this is my reasoning:
The HTML contains data which indicates a timezone, which is successfully recognized by the parser. The JSON data does not contain a timezone, which is recognized by the parser as well.
Therefore, the values should not compare equal, as stated by the documentation(https://docs.python.org/3/library/datetime.html):
If one comparand is naive and the other is aware, [TypeError](https://docs.python.org/3/library/exceptions.html#TypeError) is raised if an order comparison is attempted. For equality comparisons, naive instances are never equal to aware instances.
But this does not fail the CI. How?
I will gather additional data once we meet in person.
Here is a small update from my side: Running pytest
on a fresh installation, I get the same error as @Weyaaron.
FAILED tests/test_parser.py::TestParser::test_parsing[FAZ] - assert datetime.datetime(2023, 4, 28, 19, 20, 36) == datetime.datetime(2023, 4, 28, 19, 20, 36, tzinfo=tzlocal())
Exact system:
Operating System: openSUSE Leap 15.3
CPE OS Name: cpe:/o:opensuse:leap:15.3
Kernel: Linux 5.3.18-150300.59.106-default
Architecture: x86-64
I can confirm that act(A tool that runs the CI locally) acts the same. It does not fail with the current setup, although it should. This is further evidence that the CI is not working as intended. Maybe we should write a bug report? But I am not sure to whom.
This was fixed with 19766e3f6c34304265140bbbd3f202c92613b5e1
I don't quite understand how our CI missed this one:
This error happens in the test for the FAZ. I am unsure how to solve this.