Issue while executing the script in cmd prompt - Archiving Content to CSV

urspalani commented 2 years ago

Hi, I was using 'Archiving Content to CSV'(link) scripts provided and followed the steps provided in the same page, but still got the error as attached. Requesting the team for kind attention. Error

TK48 commented 2 years ago

Hi, Error message, <!DOCTYPE html> is HTML document type declaration but this Python code does not use HTML. Please compare your code and original sample code.

xarain81 commented 2 years ago

I had a similar issue. inspect your .py code and ensure it does not have html embedded within. Alternative is the copy / paste the .py code from github.

urspalani commented 2 years ago

Hi , thanks for the replies. Later I realized I didn't copy the code directly from the link and instead downloaded , because of that got some html code. Now I did tried the Py code and it had some error in closing brackets and fixed it , but struck with the attached error now error .

Can someone help pls.

urspalani commented 2 years ago

Does anyone used the code and executed successfully without any error, can help to share that version pls.

xarain81 commented 2 years ago

Hi @urspalani I have fixed the code above and it worked for me. Please ensure the following:

You need to run using Python 2.7 (I read that you made some code changes, suggest you re-start using the original code)
There are 3 areas you need to fix a. Line 45 - Change from ("%S") to ("%s") [i.e. change of case] b. Line 81 - Change from ("%S") to ("%s") [i.e. change of case] c. Line 131 - Change from ("%Y-%m-%d %H:%M:%S") to ("%Y-%m-%d %H-%M-%S") [i.e. remove the colon ]

Let me know if this resolves the issue. I will initiate a pull request to update the code.

urspalani commented 2 years ago

Hi @xarain81 , I used the original code and replaced the token and for #2(a & b) it was with small case only, so I left that and I changed only c. But still struck with error , refer to attached screen shot. And I am using Python 3.8.6 , will that be issue ? Screenshot 2022-09-01 120747

xarain81 commented 2 years ago

You need to run as Python 2.7. https://www.python.org/download/releases/2.7/ The sample code here will not work on Python 3.x

urspalani commented 2 years ago

Ok Installed 2.7 and struck with the attached error. Screenshot 2022-09-01 165749

urspalani commented 2 years ago

Hi @xarain81 , any update on this ?

TK48 commented 2 years ago

Do you install requests module?

urspalani commented 2 years ago

Thanks @TK48 , installed the required module and there is some error on the group id. Do I need to add the group id and name in the code , line 124 ?

for group in getGroups(): feed = getFeed(group["id"], group["name"])

# Create a new CSV named after the timestamp / group id / group name, to ensure uniqueness
csv_filename = SINCE.strftime("%Y-%m-%d %H-%M-%S") + " " + group["id"] + " " + strip(group["name"]) + ".csv"

Screenshot 2022-09-06 170709

TK48 commented 2 years ago

The format depends on the platform. How about back to original? a. Line 45 - Change from ("%s") to ("%S") [i.e. change of case] b. Line 81 - Change from ("%s") to ("%S") [i.e. change of case]

urspalani commented 2 years ago

Finally seems working , it generated few csv files but not sure what basis it extracted. Can I understand if I can give parameter on groups I need to extract and also the days ?

TK48 commented 2 years ago

Line 15: DAYS = 14 is option. If you set DAYS = 1, you can get posts which was created 1 day before only.

This script looks new posts in each group within the option days and if it found new posts, it makes a CSV file for each group.

urspalani commented 2 years ago

ya but is there any filter we can use for group , because we have around 8K+ group and it will not be feasible if it downloads the data for all group and we get request to download only for few groups for longer period. And also looks like this code extracts only for closed/secret group data , when I run it only extracted for 40+ group where we have more groups in total.

TK48 commented 2 years ago

I found the cause. The code does not handle paginated results and query parameters well. I modified and uploaded the new code. Please copy new one and test it.

rwicks001 commented 2 years ago

@urspalani Did you manage to test the new code?

TK48 commented 2 years ago

Yes, I uploaded new code. I changed parameters description and if-condition.

Line 41 and other lines related "params" (old) params += "&limit=" + DEFAULT_LIMIT (new) params += "&limit=" + DEFAULT_LIMIT

Line 107 (old) if json.dumps('"paging"') in result_json: (new) if "next" in result_json["paging"]:

urspalani commented 2 years ago

Thanks @TK48 , is there any parameter input required now with the new code ? got this error now Screenshot 2022-09-12 125203

TK48 commented 2 years ago

That is the same error you had. Could you test to change from ("%s") to ("%S")?

urspalani commented 2 years ago

I think we have more data ,it is asking to limit

TK48 commented 2 years ago

This sample mentioned it potentially has an overflow on line 70 and 71. This program keeps data in memory and would overflow if data size is huge. How about change DAYS in line 15? The default value is 14. You can try to reduce this number and test it.

Otherwise it needs to rewrite the code to avoid the overflow. For example, this starts writing CSV after reading all, but it can be avoided by reading and writing one by one.

urspalani commented 2 years ago

Actually I had just put '1' day only , because I know we have lot of group and it will take time. I think need your help to re write the code or if we have the parameter to input the group id , it will be much helper.

TK48 commented 2 years ago

Another parameter, Line 22: DEFAULT_LIMIT = "100" You can set small number. It would be helpful.

urspalani commented 2 years ago

There is diff error now, pls see attached.

Error_12092022.txt

TK48 commented 2 years ago

What numbers did you test on DEFAULT_LIMIT? Reducing the DEFAULT_LIMIT increases the number of recurrences.

Python also has the recursion limit. Usually the default is 1,000. you can change the limit. Add the following code.

import sys sys.setrecursionlimit(1500)

Notes: 1,500 is example.

urspalani commented 1 year ago

Checking here again , possible to get the email id of the one who post and also total shares of each post ?

fbsamples / workplace-platform-samples

Issue while executing the script in cmd prompt - Archiving Content to CSV #113