eliasdabbas / advertools

advertools - online marketing productivity and analysis tools
https://advertools.readthedocs.io
MIT License
1.16k stars 213 forks source link

Split, Parse, and Analyze URL Structure Error #372

Closed huseyincenik closed 3 months ago

huseyincenik commented 3 months ago

I cannot run the following code on your site. It returns error. https://advertools.readthedocs.io/en/master/advertools.urlytics.html


urls = ['https://netloc.com/path_1/path_2?price=10&color=blue#frag_1',
        'https://netloc.com/path_1/path_2?price=15&color=red#frag_2',
        'https://netloc.com/path_1/path_2/path_3?size=sm&color=blue#frag_1',
        'https://netloc.com/path_1?price=10&color=blue']
adv.url_to_df(urls)

Error:


TypeError Traceback (most recent call last) Cell In[20], line 7 1 import advertools as adv 3 urls = ['https://netloc.com/path_1/path_2?price=10&color=blue#frag_1', 4 'https://netloc.com/path_1/path_2?price=15&color=red#frag_2', 5 'https://netloc.com/path_1/path_2/path_3?size=sm&color=blue#frag_1', 6 'https://netloc.com/path_1?price=10&color=blue'] ----> 7 adv.url_to_df(urls)

File ~\AppData\Roaming\Python\Python310\site-packages\advertools\urlytics.py:266, in url_to_df(urls, decode, output_file) 264 urldf = _url_to_df(sublist, decode=decode) 265 urldf.index = range(i step, (i step) + len(urldf)) --> 266 urldf.to_parquet(f"{tmpdir}/{i:08}.parquet", index=True, version="2.6") 267 final_df_list = [ 268 pd.read_parquet(f"{tmpdir}/{tmpfile}") for tmpfile in os.listdir(tmpdir) 269 ] 270 final_df = pd.concat(final_df_list).sort_index()

File C:\ProgramData\anaconda3\envs\r_base_n\lib\site-packages\pandas\util_decorators.py:333, in deprecate_nonkeyword_arguments..decorate..wrapper(*args, *kwargs) 327 if len(args) > num_allow_args: 328 warnings.warn( 329 msg.format(arguments=_format_argument_list(allow_args)), 330 FutureWarning, 331 stacklevel=find_stack_level(), 332 ) --> 333 return func(args, **kwargs)

File C:\ProgramData\anaconda3\envs\r_base_n\lib\site-packages\pandas\core\frame.py:3113, in DataFrame.to_parquet(self, path, engine, compression, index, partition_cols, storage_options, kwargs) 3032 """ 3033 Write a DataFrame to the binary parquet format. 3034 (...) 3109 >>> content = f.read() 3110 """ 3111 from pandas.io.parquet import to_parquet -> 3113 return to_parquet( 3114 self, 3115 path, 3116 engine, 3117 compression=compression, 3118 index=index, 3119 partition_cols=partition_cols, 3120 storage_options=storage_options, 3121 kwargs, 3122 )

File C:\ProgramData\anaconda3\envs\r_base_n\lib\site-packages\pandas\io\parquet.py:480, in to_parquet(df, path, engine, compression, index, storage_options, partition_cols, filesystem, kwargs) 476 impl = get_engine(engine) 478 path_or_buf: FilePath | WriteBuffer[bytes] = io.BytesIO() if path is None else path --> 480 impl.write( 481 df, 482 path_or_buf, 483 compression=compression, 484 index=index, 485 partition_cols=partition_cols, 486 storage_options=storage_options, 487 filesystem=filesystem, 488 kwargs, 489 ) 491 if path is None: 492 assert isinstance(path_or_buf, io.BytesIO)

File C:\ProgramData\anaconda3\envs\r_base_n\lib\site-packages\pandas\io\parquet.py:349, in FastParquetImpl.write(self, df, path, compression, index, partition_cols, storage_options, filesystem, kwargs) 344 raise ValueError( 345 "storage_options passed with file object or non-fsspec file path" 346 ) 348 with catch_warnings(record=True): --> 349 self.api.write( 350 path, 351 df, 352 compression=compression, 353 write_index=index, 354 partition_on=partition_cols, 355 kwargs, 356 )

TypeError: write() got an unexpected keyword argument 'version'

eliasdabbas commented 3 months ago

Thanks for reporting.

This error could mean that you have an old version of pyarrow.

TypeError: write() got an unexpected keyword argument 'version'

Can you please share the versions of advertools, pandas, and pyarrow that you are using?

I just double checked on Google colab and the code you shared worked fine.

huseyincenik commented 3 months ago

Hello, thank you. I was having this problem since my old version. When I restarted Anaconda, my problem was solved. @eliasdabbas