Closed Dibal closed 7 months ago
Each query parameter in the list of URLs is supposed to have a column in the resulting DataFrame.
If you got 120, this means there are 120 unique query parameters in your dataset.
query_[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0][
Is this the actual column name that you got? If you so, can you please share a sample of those URLs so I can look into it?
https://file.io/5cFRYXltXG5E 202020
Looks fine. This parameter appears in one of the URLs twice (out of 148,272). It's probably a bug somewhere on the server causing this parameter to be created. The original URLs contain this in the url
column.
You might want to try this URL from a browser, with and without the long parameter and see what happens.
"/index.php?view=article&catid=34:raith&id=49:ferienwohnung-1&tmpl=component&print=1&[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['layout']']']']']']']']']']']']']']']']']']']']']']']']']']']']']']']']']']']']']']']']']']']']']']']']']']']']']']']']']']']']']']']']']']']']']']']']']']']']']']']']']']']']']']']']']']']']']']']']']']']']']']']']']']']']']']']']']']']']']']']']']']']']']']']']']']']']']']']']']']=/../../../../%2sasae/WEB-INF/web.xml&page=&option=com_content&Itemid=47"
Thanks for sharing.
After application of
adv.url_to_df(logs_df['request'])
on my dataset the dataframe explodes to more than 120 columns with names like:'query_template', 'query_archive', 'query_key', 'query_per', 'queryx', "query[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0][
Applied on referer produces another 40 columns. Is this behavior intended?