eliasdabbas / advertools

advertools - online marketing productivity and analysis tools
https://advertools.readthedocs.io
MIT License
1.12k stars 211 forks source link

request_url_df creates wide list? #337

Closed Dibal closed 7 months ago

Dibal commented 7 months ago

After application of adv.url_to_df(logs_df['request']) on my dataset the dataframe explodes to more than 120 columns with names like:

'query_template', 'query_archive', 'query_key', 'query_per', 'queryx', "query[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0][

Applied on referer produces another 40 columns. Is this behavior intended?

eliasdabbas commented 7 months ago

Each query parameter in the list of URLs is supposed to have a column in the resulting DataFrame.

If you got 120, this means there are 120 unique query parameters in your dataset.

query_[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0][

Is this the actual column name that you got? If you so, can you please share a sample of those URLs so I can look into it?

Dibal commented 7 months ago

https://file.io/5cFRYXltXG5E 202020

eliasdabbas commented 7 months ago

Looks fine. This parameter appears in one of the URLs twice (out of 148,272). It's probably a bug somewhere on the server causing this parameter to be created. The original URLs contain this in the url column.

You might want to try this URL from a browser, with and without the long parameter and see what happens.

"/index.php?view=article&catid=34:raith&id=49:ferienwohnung-1&tmpl=component&print=1&[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['layout']']']']']']']']']']']']']']']']']']']']']']']']']']']']']']']']']']']']']']']']']']']']']']']']']']']']']']']']']']']']']']']']']']']']']']']']']']']']']']']']']']']']']']']']']']']']']']']']']']']']']']']']']']']']']']']']']']']']']']']']']']']']']']']']']']']']']']']']']']=/../../../../%2sasae/WEB-INF/web.xml&page=&option=com_content&Itemid=47"

Thanks for sharing.