dagster-io / fake-star-detector

https://github.com/dagster-io/dagster
234 stars 19 forks source link

silmper_model classified_stargazers_df error #11

Open redeux opened 1 year ago

redeux commented 1 year ago

Running the simpler_model against a repo with 30k star produced the following error while entering the classified_stargazers_df stage, after roughly 6 hours of execution.

dagster._core.errors.DagsterExecutionStepExecutionError: Error occurred while executing op "classified_stargazers_df":  File "/fake-star-detector/venv/lib64/python3.9/site-packages/dagster/_core/execution/plan/execute_plan.py", line 273, in dagster_event_sequence_for_step    for step_event in check.generator(step_events):  File "/fake-star-detector/venv/lib64/python3.9/site-packages/dagster/_core/execution/plan/execute_step.py", line 481, in core_dagster_event_sequence_for_step    for user_event in _step_output_error_checked_user_event_sequence(  File "/fake-star-detector/venv/lib64/python3.9/site-packages/dagster/_core/execution/plan/execute_step.py", line 164, in _step_output_error_checked_user_event_sequence    for user_event in user_event_sequence:  File "/fake-star-detector/venv/lib64/python3.9/site-packages/dagster/_core/execution/plan/execute_step.py", line 95, in _process_asset_results_to_events    for user_event in user_event_sequence:  File "/fake-star-detector/venv/lib64/python3.9/site-packages/dagster/_core/execution/plan/compute.py", line 203, in execute_core_compute    for step_output in _yield_compute_results(step_context, inputs, compute_fn):  File "/fake-star-detector/venv/lib64/python3.9/site-packages/dagster/_core/execution/plan/compute.py", line 172, in _yield_compute_results    for event in iterate_with_context(  File "/fake-star-detector/venv/lib64/python3.9/site-packages/dagster/_utils/__init__.py", line 448, in iterate_with_context    return  File "/usr/lib64/python3.9/contextlib.py", line 137, in __exit__    self.gen.throw(typ, value, traceback)  File "/fake-star-detector/venv/lib64/python3.9/site-packages/dagster/_core/execution/plan/utils.py", line 84, in op_execution_error_boundary    raise error_cls(

The above exception was caused by the following exception:TypeError: Cannot compare Timestamp with datetime.date. Use ts == pd.Timestamp(date) or ts.date() == date instead.  

File "/fake-star-detector/venv/lib64/python3.9/site-packages/dagster/_core/execution/plan/utils.py", line 54, in op_execution_error_boundary    yield  File "/fake-star-detector/venv/lib64/python3.9/site-packages/dagster/_utils/__init__.py", line 446, in iterate_with_context    next_output = next(iterator)  File "/fake-star-detector/venv/lib64/python3.9/site-packages/dagster/_core/execution/plan/compute_generator.py", line 126, in _coerce_op_compute_fn_to_iterator    result = invoke_compute_fn(  File "/fake-star-detector/venv/lib64/python3.9/site-packages/dagster/_core/execution/plan/compute_generator.py", line 120, in invoke_compute_fn    return fn(context, **args_to_pass) if context_arg_provided else fn(**args_to_pass)  File "/fake-star-detector/fake_star_detector/assets/simpler_model.py", line 186, in classified_stargazers_df    stargazers_with_user_info["matches_fake_heuristic"] = stargazers_with_user_info.apply(  File "/fake-star-detector/venv/lib64/python3.9/site-packages/pandas/core/frame.py", line 10037, in apply    return op.apply().__finalize__(self, method="apply")  File "/fake-star-detector/venv/lib64/python3.9/site-packages/pandas/core/apply.py", line 831, in apply    return self.apply_standard()  File "/fake-star-detector/venv/lib64/python3.9/site-packages/pandas/core/apply.py", line 957, in apply_standard    results, res_index = self.apply_series_generator()  File "/fake-star-detector/venv/lib64/python3.9/site-packages/pandas/core/apply.py", line 973, in apply_series_generator    results[i] = self.func(v, *self.args, **self.kwargs)  File "/fake-star-detector/fake_star_detector/assets/simpler_model.py", line 206, in _validate_star    and (row["created_at"] > datetime.date(2022, 1, 1))  File "timestamps.pyx", line 378, in pandas._libs.tslibs.timestamps._Timestamp.__richcmp__

The runtime environment is an AWS EC2 instance running Amazon Linux 2 with the following configuration

#!/bin/bash
sudo yum install python3.11 python3-pip git -y
python3 -m pip install --upgrade pip
git clone https://github.com/dagster-io/fake-star-detector.git
cd fake-star-detector
echo GITHUB_ACCESS_TOKEN=xxx > .env
pip install dagster dagster-webserver
python3 -m venv venv
source venv/bin/activate
pip install -e ".[dev]"
DAGSTER_HOME=/home/ec2-user/.local/bin/ dagster-webserver -h 0.0.0.0 -p 3000
oliviertassinari commented 11 months ago

Duplicate of #7?