Open oddtazz opened 3 years ago
@oddtazz This is a tricky one for me because I cannot reproduce your error with my computers, which means there is likely something different/unique about your WhatsApp and HTML that I have not encountered yet. I'm unable to provide any precise fixes for this unless I can see some of the HTML to investigate the cause.
If you are OK sharing your HTML with private info removed, then run the whatsoup.py
script again and when the error is thrown, don't close the script or your browser but instead open up dev tools in Chrome (F12), run the below JavaScript snippet in the Console, verify there's no other private info in it, and then send it to me on here or you can send it privately/offline as well. I attached a photo of how it looks when ran.
If you are not OK with this approach, then an easy workaround is replacing line 212 of whatsoup.py
with last_chat_msg = ''
which will just make the terminal window not show any of the last chat messages when you're selecting a chat to scrape/export without affecting the actual scraping/exporting of your chat history.
// This script sets all names/messages from chats in the left chat pane to 'redacted' instead of private info
// Get all of the viewable 'chat cards' from left pane. A chat card is what it sounds like, the rectangle with a person/groups photo and name, last message, and last message time. Note: these load dynamically based on your viewport, so it will only get the cards which can be seen in your browser window.
var chat_cards = document.querySelector('[aria-label*="Chat list"]').childNodes
// Loop through each of the viewable chat cards
for (let i=0; i<chat_cards.length; i++){
// Get all descendants for a chat card
let elems = chat_cards[i].querySelectorAll('*')
// Loop through each descendant and search for the elements which hold private data, replacing it with 'redacted'
for (let j=0; j<elems.length; j++){
if (elems[j].getAttribute('title')){
console.log("Replacing '%s' with 'redacted'", elems[j].getAttribute('title'));
elems[j].setAttribute('title','redacted');
elems[j].innerText = 'redacted';
}
}
}
// Print the redacted HTML so it can be safely shared
document.querySelector('[aria-label*="Chat list"]').outerHTML;
Hey I have created a secret gist with the information you asked for https://gist.githubusercontent.com/oddtazz/10bb6e2111c392d960c53db6146efff4/raw/86f985f04b90c81f54e5ef24f92653b14dd260cf/redacted.html
@oddtazz I'm not seeing anything out of the ordinary. Any chance you have chrome extensions installed that may be modifying the DOM?
Also, did you try the other workaround I noted above by setting the offending line to an empty string: last_chat_msg = ''
? The only critical piece of data that's needed in the get_chats
function is name_of_chat
which you aren't having issues with, so setting the last_chat_msg
variable to an empty string should fix the issue.
I have no chrome extensions, chromedriver launches a completely different instance of chrome doesn't it. Just to be sure I disabled all extensions and ran the script again which gives me the same stack trace.
I also set last_chat_msg
to an empty string but this too gives me the same error.
So here's what I am thinking:
I have tried this script on 2 macbook pros (13" and 15") and a windows machine. I get the same error in all the places so it is probably not operating system related or hardware related.
I am using Version 89.0.4389.82
of chromedriver and browser which is the latest version at the moment. Is it the same version you are using too? I can't think of any other issue which could cause this behavior.
We are using the same version of Chrome. Thanks for checking on multiple machines and ruling out hardware.
I wonder if WhatsApp is changing some of its UI based on your locale? Can we compare localization?
chrome://settings/languages
you can verify what language your browser is using. Mine is using English
.Accept-Language
. Mine is "en-US,en;q=0.5"
Also just to clarify setting last_chat_msg
to an empty string is still not working for you...can you try replacing the entire get_chats
function starting at line 152 with this and running the script again? Please share any traceback if it throws an error again.
def get_chats(driver):
'''Traverses the WhatsApp chat-pane via keyboard input and collects chat information such as person/group name, last chat time and msg'''
print("Loading your chats...", end="\r")
# Wrap entire function in a retryable try/catch because chat-pane DOM changes frequently due to users typing, sending messages, and occasional WhatsApp notifications
retry_attempts = 0
while retry_attempts < 3:
retry_attempts += 1
# Try traversing the chat-pane
try:
# Find the chat search (xpath == 'Search or start new chat' element)
chat_search = driver.find_element_by_xpath(
'//*[@id="side"]/div[1]/div/label/div/div[2]')
chat_search.click()
# Count how many chat records there are below the search input by using keyboard navigation because HTML is dynamically changed depending on viewport and location in DOM
selected_chat = driver.switch_to.active_element
prev_chat_id = None
is_last_chat = False
chats = []
# Descend through the chats
while True:
# Navigate to next chat
selected_chat.send_keys(Keys.DOWN)
# Set active element to new chat (without this we can't access the elements '.text' value used below for name/time/msg)
selected_chat = driver.switch_to.active_element
# Check if we are on the last chat by comparing current to previous chat
if selected_chat.id == prev_chat_id:
is_last_chat = True
else:
prev_chat_id = selected_chat.id
# Gather chat info (chat name, chat time, and last chat message)
if is_last_chat:
break
else:
# Get the container of the contact card's title (xpath == parent div container to the span w/ title attribute set to chat name)
contact_title_container = selected_chat.find_element_by_xpath(
"./div/div[2]/div/div[1]")
# Then get all the spans it contains
contact_title_container_spans = contact_title_container.find_elements_by_tag_name(
'span')
# Then loop through all those until we find one w/ a title property
for span_title in contact_title_container_spans:
if span_title.get_property('title'):
name_of_chat = span_title.get_property('title')
break
# Store chat info within a dict
chat = {"name": name_of_chat, "time": '', "message": ''}
chats.append(chat)
# Navigate back to the top of the chat list
chat_search.click()
chat_search.send_keys(Keys.DOWN)
print("Success! Your chats have been loaded.")
break
# Catch errors related to DOM changes
except (StaleElementReferenceException, ElementNotInteractableException) as e:
if retry_attempts == 3:
# Make sure we grant user option to exit if DOM keeps changing while scanning chat list
print("This is taking longer than usual...")
while True:
response = input(
"Try loading chats again (y/n)? ")
if response.strip().lower() in {'n', 'no'}:
print(
'Error! Aborting chat load by user due to frequent DOM changes.')
if type(e).__name__ == 'StaleElementReferenceException':
raise StaleElementReferenceException
else:
raise ElementNotInteractableException
elif response.strip().lower() in {'y', 'yes'}:
retry_attempts = 0
break
else:
continue
else:
pass
return chats
@oddtazz Checking in to see if the above suggestion resolved the issue for you?
Hey sorry for the late reply Eddy,
I managed to get it to work after your previous comment. Turns out you were using en-US,en
and I was using en-IN,en,en-UK
English(India), English and English(UK) languages.
My Accept-Language
= "en-UK,en;q=0.9,en-IN;q=0.8"
The solution that worked for me was to make my Accept-Language
look like yours.
Thanks for making this software!
@oddtazz Thanks for confirming! Quick question, did you solve this only by setting Accept-Language
? For example did you also set WhatsApp language on your phone to English? I'm asking because I'm testing it with the en-UK
locale you provided and still can't reproduce the error.
I added this line to setup_selenium()
:
# Set locale to @oddtazz's config
options.add_experimental_option('prefs', {'intl.accept_languages': 'en-UK,en;q=0.9,en-IN;q=0.8'})
HTTP Bin shows: "Accept-Language": "en-UK,en;q=0.9,en;q=0.9;q=0.8,en-IN;q=0.8;q=0.7"
however I don't see any changes to WhatsApp UI and my script doesn't throw any errors.
I managed to break this again today with accepted languages as "Accept-Language": "en-US,en;q=0.9"
I am sure I have made no changes to chrome apart from fiddling with the language settings.
Maybe language is not the issue in this case, but that would lead to a bigger puzzle. What changed between 6 days ago and today? Specially since chrome version is the same. I would say treat my issue as an edge case. Maybe I am doing something very different compared to others. (It would be helpful to know in what way though :p )
Regardless keep up the good work!
I managed to break this again today with accepted languages as
"Accept-Language": "en-US,en;q=0.9"
I am sure I have made no changes to chrome apart from fiddling with the language settings.Maybe language is not the issue in this case, but that would lead to a bigger puzzle. What changed between 6 days ago and today? Specially since chrome version is the same. I would say treat my issue as an edge case. Maybe I am doing something very different compared to others. (It would be helpful to know in what way though :p )
Regardless keep up the good work!
@oddtazz , just check to make sure the chat its breaking on is not blocked. Mine was breaking and it wasn't a language issue . It broke on a blocked chat.
Yep, I've got the same issue of this breaking on a contact that I've blocked.
So do I, when I deleted the blocked contact the proses continue.
But I still get the issue. Dunno still finding the another issue
And mine issue is like its runs perfectly and at the end it asks me to select the format like csv,txt or html and when is select any format it runs and asks me that i want to export any other chat and after when i click no it closes but when i open the exported chat it only shows the header it doesn't have any content please help
Closing this bug report as this project is not maintained anymore.
89.0.4389.82
Python 3.8.2
The script opens chrome and starts going through messages and crashes randomly at different messages. Language is set to english