Closed urspalani closed 1 month ago
Hi, Error message, <!DOCTYPE html> is HTML document type declaration but this Python code does not use HTML. Please compare your code and original sample code.
I had a similar issue. inspect your .py code and ensure it does not have html embedded within. Alternative is the copy / paste the .py code from github.
Hi , thanks for the replies. Later I realized I didn't copy the code directly from the link and instead downloaded , because of that got some html code. Now I did tried the Py code and it had some error in closing brackets and fixed it , but struck with the attached error now .
Can someone help pls.
Does anyone used the code and executed successfully without any error, can help to share that version pls.
Hi @urspalani I have fixed the code above and it worked for me. Please ensure the following:
Let me know if this resolves the issue. I will initiate a pull request to update the code.
Hi @xarain81 , I used the original code and replaced the token and for #2(a & b) it was with small case only, so I left that and I changed only c. But still struck with error , refer to attached screen shot. And I am using Python 3.8.6 , will that be issue ?
You need to run as Python 2.7. https://www.python.org/download/releases/2.7/ The sample code here will not work on Python 3.x
Ok Installed 2.7 and struck with the attached error.
Hi @xarain81 , any update on this ?
Do you install requests module?
Thanks @TK48 , installed the required module and there is some error on the group id. Do I need to add the group id and name in the code , line 124 ?
for group in getGroups(): feed = getFeed(group["id"], group["name"])
# Create a new CSV named after the timestamp / group id / group name, to ensure uniqueness
csv_filename = SINCE.strftime("%Y-%m-%d %H-%M-%S") + " " + group["id"] + " " + strip(group["name"]) + ".csv"
The format depends on the platform. How about back to original? a. Line 45 - Change from ("%s") to ("%S") [i.e. change of case] b. Line 81 - Change from ("%s") to ("%S") [i.e. change of case]
Finally seems working , it generated few csv files but not sure what basis it extracted. Can I understand if I can give parameter on groups I need to extract and also the days ?
Line 15: DAYS = 14 is option. If you set DAYS = 1, you can get posts which was created 1 day before only.
This script looks new posts in each group within the option days and if it found new posts, it makes a CSV file for each group.
ya but is there any filter we can use for group , because we have around 8K+ group and it will not be feasible if it downloads the data for all group and we get request to download only for few groups for longer period. And also looks like this code extracts only for closed/secret group data , when I run it only extracted for 40+ group where we have more groups in total.
I found the cause. The code does not handle paginated results and query parameters well. I modified and uploaded the new code. Please copy new one and test it.
@urspalani Did you manage to test the new code?
Yes, I uploaded new code. I changed parameters description and if-condition.
Line 41 and other lines related "params" (old) params += "&limit=" + DEFAULT_LIMIT (new) params += "&limit=" + DEFAULT_LIMIT
Line 107 (old) if json.dumps('"paging"') in result_json: (new) if "next" in result_json["paging"]:
Thanks @TK48 , is there any parameter input required now with the new code ? got this error now
That is the same error you had. Could you test to change from ("%s") to ("%S")?
I think we have more data ,it is asking to limit
This sample mentioned it potentially has an overflow on line 70 and 71. This program keeps data in memory and would overflow if data size is huge. How about change DAYS in line 15? The default value is 14. You can try to reduce this number and test it.
Otherwise it needs to rewrite the code to avoid the overflow. For example, this starts writing CSV after reading all, but it can be avoided by reading and writing one by one.
Actually I had just put '1' day only , because I know we have lot of group and it will take time. I think need your help to re write the code or if we have the parameter to input the group id , it will be much helper.
Another parameter, Line 22: DEFAULT_LIMIT = "100" You can set small number. It would be helpful.
There is diff error now, pls see attached.
What numbers did you test on DEFAULT_LIMIT? Reducing the DEFAULT_LIMIT increases the number of recurrences.
Python also has the recursion limit. Usually the default is 1,000. you can change the limit. Add the following code.
import sys sys.setrecursionlimit(1500)
Notes: 1,500 is example.
Checking here again , possible to get the email id of the one who post and also total shares of each post ?
Hi, I was using 'Archiving Content to CSV'(link) scripts provided and followed the steps provided in the same page, but still got the error as attached. Requesting the team for kind attention.