awslabs / amazon-dynamodb-tools

Tools to make effective use of DynamoDB easier.
Apache License 2.0
119 stars 25 forks source link

RecursionError: maximum recursion depth exceeded in comparison #24

Open tatzlwurm2 opened 1 year ago

tatzlwurm2 commented 1 year ago

Hello I have tried the capacity_reco.py in the following ways 1 table, default, limited parameters , all tables Have also tried on Mac (Darwin mac-WM22KW6QLN 22.6.0 Darwin Kernel Version 22.6.0: Wed Jul 5 22:21:53 PDT 2023; root:xnu-8796.141.3~6/RELEASE_ARM64_T6020 arm64) , docker python (Linux ea6b9d67a2de 6.3.13-linuxkit #1 SMP PREEMPT Thu Sep 7 07:48:47 UTC 2023 aarch64 GNU/Linux) and an ec2 (amazonlinux host), and hardcoding parameters to low values

def get_params(args):
    params = {}
    params['dynamodb_tablename'] = args.dynamodb_tablename
    params['dynamodb_read_utilization'] = 10
    params['dynamodb_write_utilization'] = 10
    params['dynamodb_minimum_write_unit'] = 10
    params['dynamodb_maximum_write_unit'] = 10
    params['dynamodb_minimum_read_unit'] = 10
    params['dynamodb_maximum_read_unit'] = 10
    params['number_of_days_look_back'] = 4
    params['max_concurrent_tasks'] = 1

Additionally tried setting sys.setrecursionlimit(2750) as high as I could be in each case I get the same error.

Its failing in this function

def process_dynamodb_table(dynamodb_table_info: pd.DataFrame, params: dict, debug: bool) -> pd.DataFrame:
    print('starting process to get dynamodb table metrics')
    result = get_metrics(params)
    metric_df = result[0]
    estimate_df = result[1]
    print('Estimating cost...')
    summary_result = recommendation_summary(
        params, metric_df, estimate_df, dynamodb_table_info)
    cost_estimate_df = summary_result[1]
    if debug:
        filename_metrics = os.path.join(dir_path, 'metrics.csv')
        filename_estimate = os.path.join(dir_path, 'estimate.csv')
        filename_cost_estimate = os.path.join(dir_path, 'cost_estimate.csv')
"capacity_reco.py" 95L, 4002B                                                                                                                                                           20,1          Top
        filename_estimate = os.path.join(dir_path, 'estimate.csv')
        filename_cost_estimate = os.path.join(dir_path, 'cost_estimate.csv')
        metric_df.to_csv(filename_metrics, index=False)
        estimate_df.to_csv(filename_estimate, index=False)
        cost_estimate_df.to_csv(filename_cost_estimate, index=False)
    filename_summary = os.path.join(dir_path, 'analysis_summary.csv')
    summary_result[0].to_csv(filename_summary, index=False)
    return summary_result[0]
jakob-vendegna-sp commented 8 months ago

same here, please advise.

tebanieo commented 7 months ago

Hello

I cannot replicate this error, could you please paste the entire trace error so we can investigate further? including how you are executing the script?

Please use single quotes` to show yourcommand execution` and triple quotes ``` to show the results of your execution:

including your actual error trace that 
can span across 
different lines 

That will incredibly help for readability.

Thanks!

jakob-vendegna-sp commented 7 months ago
Getting DynamoDB Tables info ...:   0%|          | 0/1 [00:00<?, ?it/s]
Getting DynamoDB Tables info ...: 100%|██████████| 1/1 [00:00<00:00,  1.15it/s]
starting process to get dynamodb table metrics

  0%|          | 0/9 [00:00<?, ?it/s]
 11%|█         | 1/9 [00:00<00:02,  3.95it/s]
 56%|█████▌    | 5/9 [00:03<00:03,  1.28it/s]
100%|██████████| 9/9 [00:03<00:00,  2.41it/s]
starting process to estimate dynamodb table provisioned metrics

  0%|          | 0/1 [00:00<?, ?it/s]
100%|██████████| 1/1 [00:00<00:00,  3.50it/s]
100%|██████████| 1/1 [00:00<00:00,  3.50it/s]
Estimating cost...
Traceback (most recent call last):
  File "/Users/jakob.vendegna/sailpoint/dynamodb-cost-tools/.venv/lib/python3.11/site-packages/pandas/core/generic.py", line 2168, in __array_ufunc__
    return arraylike.array_ufunc(self, ufunc, method, *inputs, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/jakob.vendegna/sailpoint/dynamodb-cost-tools/.venv/lib/python3.11/site-packages/pandas/core/arraylike.py", line 399, in array_ufunc
    result = getattr(ufunc, method)(*inputs, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/jakob.vendegna/sailpoint/dynamodb-cost-tools/.venv/lib/python3.11/site-packages/pandas/core/generic.py", line 2168, in __array_ufunc__
    return arraylike.array_ufunc(self, ufunc, method, *inputs, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/jakob.vendegna/sailpoint/dynamodb-cost-tools/.venv/lib/python3.11/site-packages/pandas/core/arraylike.py", line 399, in array_ufunc
    result = getattr(ufunc, method)(*inputs, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/jakob.vendegna/sailpoint/dynamodb-cost-tools/.venv/lib/python3.11/site-packages/pandas/core/generic.py", line 2168, in __array_ufunc__
    return arraylike.array_ufunc(self, ufunc, method, *inputs, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/jakob.vendegna/sailpoint/dynamodb-cost-tools/.venv/lib/python3.11/site-packages/pandas/core/arraylike.py", line 399, in array_ufunc
    result = getattr(ufunc, method)(*inputs, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/jakob.vendegna/sailpoint/dynamodb-cost-tools/.venv/lib/python3.11/site-packages/pandas/core/generic.py", line 2168, in __array_ufunc__
    return arraylike.array_ufunc(self, ufunc, method, *inputs, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/jakob.vendegna/sailpoint/dynamodb-cost-tools/.venv/lib/python3.11/site-packages/pandas/core/arraylike.py", line 399, in array_ufunc
    result = getattr(ufunc, method)(*inputs, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/jakob.vendegna/sailpoint/dynamodb-cost-tools/.venv/lib/python3.11/site-packages/pandas/core/generic.py", line 2168, in __array_ufunc__
    return arraylike.array_ufunc(self, ufunc, method, *inputs, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

<omitted approximately 2900 lines of the EXACT SAME error blocks repeating>

  File "/Users/jakob.vendegna/sailpoint/dynamodb-cost-tools/.venv/lib/python3.11/site-packages/pandas/core/generic.py", line 2168, in __array_ufunc__
    return arraylike.array_ufunc(self, ufunc, method, *inputs, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/jakob.vendegna/sailpoint/dynamodb-cost-tools/.venv/lib/python3.11/site-packages/pandas/core/arraylike.py", line 399, in array_ufunc
    result = getattr(ufunc, method)(*inputs, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/jakob.vendegna/sailpoint/dynamodb-cost-tools/.venv/lib/python3.11/site-packages/pandas/core/generic.py", line 2168, in __array_ufunc__
    return arraylike.array_ufunc(self, ufunc, method, *inputs, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/jakob.vendegna/sailpoint/dynamodb-cost-tools/.venv/lib/python3.11/site-packages/pandas/core/arraylike.py", line 266, in array_ufunc
    from pandas.core.internals import (
RecursionError: maximum recursion depth exceeded
jakob-vendegna-sp commented 7 months ago

https://github.com/awslabs/amazon-dynamodb-tools/pull/31

I fixed this on my end by un-nesting the where clauses. The view_df dataframe is quite large, causing the recursion error in pandas. I'm on a mac m1, which likely has different recursion limits - however increasing this had no positive effect.

tebanieo commented 7 months ago

Thanks for this comment, this provide some light for our investigation, you are using python 3.11 we have tested the script using python 3.8 and 3.9. I will try to replicate using 3.11 in both my laptop and using EC2.

There are some breaking changes between pandas 1.5 and 2.2, I will be very conservative on that upgrade, I will check with @switch180 as well, to ensure with that change we wont be breaking another tool.

jakob-vendegna-sp commented 7 months ago

I may have just updated numpy and pandas to see if that was the cause. I can remove the requirements file from my PR or make any other changes you might want to see.

switch180 commented 7 months ago

In #35 we merged a PR that should fix this problem. Pull from tip of master branch. Please review the code @tatzlwurm2 @jakob-vendegna-sp and see if it works for you without error.

Once we have confirmation you're unblocked we will resolve this issue.

jakob-vendegna-sp commented 7 months ago

I'll have to cherry pick the commits into my own detached fork -- no aws session or profile support. But I very much appreciate the attention to this, thank you! I'll test as soon as time allows.