estebanpdl / telegram-tracker

The package connects to Telegram's API to generate JSON files containing data for channels, including information and posts. It allows you to search for specific channels or a set of channels provided in a text file, with one channel per line.
326 stars 64 forks source link

Pandas Merge Error #13

Closed Firebug24k closed 1 month ago

Firebug24k commented 3 months ago

When running this on a large channel I get the following error:

Writing channel data... done.

Writing posts data... done.

Traceback (most recent call last): File "/empire/secure/telegram/newscript/telegram-tracker/main.py", line 387, in df = df.merge(counter_df, how='left', on='id') ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/firebug/miniconda3/lib/python3.11/site-packages/pandas/core/frame.py", line 10832, in merge return merge( ^^^^^^ File "/home/firebug/miniconda3/lib/python3.11/site-packages/pandas/core/reshape/merge.py", line 184, in merge return op.get_result(copy=copy) ^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/firebug/miniconda3/lib/python3.11/site-packages/pandas/core/reshape/merge.py", line 888, in get_result result = self._reindex_and_concat( ^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/firebug/miniconda3/lib/python3.11/site-packages/pandas/core/reshape/merge.py", line 840, in _reindex_and_concat llabels, rlabels = _items_overlap_with_suffix( ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/firebug/miniconda3/lib/python3.11/site-packages/pandas/core/reshape/merge.py", line 2757, in _items_overlap_with_suffix raise MergeError( pandas.errors.MergeError: Passing 'suffixes' which cause duplicate columns {'from_messages_x', 'channel_req_targeted_by_x', 'channel_request_x', 'source_x', 'counter_x'} is not allowed.

estebanpdl commented 3 months ago

Could you let me know if you tried to add new information to an existing file? The current tool should not be creating duplicate columns.

SmartFinn commented 1 month ago

I'm facing with this issue too. Yes, this is happened when I tried to update an existing dataset. Here's the broken output folder output.tar.gz, and the command used for update was: python3 main.py --telegram-channel WarArchive_ua

estebanpdl commented 1 month ago

A virtual environment is recommended to avoid conflict in the versions of the libraries. Also, the tool is not designed to update existing databases. Use a different output path.

estebanpdl commented 1 month ago

In the coming weeks, I'll update the repository with new functionalities. These new functionalities may help to avoid these issues. Thanks all for your patience 🙏