estebanpdl / telegram-tracker

The package connects to Telegram's API to generate JSON files containing data for channels, including information and posts. It allows you to search for specific channels or a set of channels provided in a text file, with one channel per line.
326 stars 64 forks source link

Value Error : You are trying to merge on object and int64 columns. if you wish to proceed, you should use pd.concat #2

Closed benborges closed 1 year ago

benborges commented 2 years ago

image

benborges commented 2 years ago

@estebanpdl I havent been able to reproduce the error on my side anymore, but this one has taken center stage and happens consistently with that channel..

estebanpdl commented 2 years ago

This one is peculiar. Not sure if you opened this file and then ran the program again? Right now, the program saves all files in the output folder. I recommend having the output folder clean/empty before running a new set of channels.

benborges commented 2 years ago

so in between each execution the output folder should be cleaned ?

I'm asking because I need to have this "incremental" but i'm afraid If I clean the output folder it will restart from the start ?

benborges commented 2 years ago

I haven't tried to open the file no, because I don't know which one is concerned ?

benborges commented 2 years ago

This is my current output/data folder : image

estebanpdl commented 2 years ago

In this error, the problem occurred with collected_chats.csv. If you want to update this constantly, for now, it would be better to merge those datasets separately after each execution. Right now - for better performance - it is recommended to have the output folder clean before executing a new request.

estebanpdl commented 2 years ago

Just as a note, in my own research, I have been dealing a bit with updating the collected_chats file. So, I am planning to update the code into something that allows updating the new request with the previous ones, in case this is necessary.

benborges commented 2 years ago

I have loads of questions....

Does it mean that in between each run I can erase the whole data folder ? and it will be recreated for each run? and then merge, does it mean merging all the CSV and the JSON of each "data" folder run to have a unified archive ? if that's the case I don't understand where it keeps track of channel/message id already written to the JSON

I'm confused probably

Unless the only file that stays in between run is the counter.csv ? that would make sense

Dhadora commented 1 year ago

image

there is a solution?

estebanpdl commented 1 year ago

Hi, please clone the new updated version. I was no able to replicate the same error using the updated one.

estebanpdl commented 1 year ago

I'll close this issue. Some of the questions here were addressed ion the updated version of this repo. If you have new flags, please open a new issue. Thank you!