Andereoo / TkinterWeb

Python bindings for Tkhtml.
MIT License
153 stars 16 forks source link

Node for found text #93

Closed rodneyboyd closed 6 months ago

rodneyboyd commented 7 months ago

After a sucessful frame.find_text, frame.html.current_node is None. Is there a way to get the node for the selected text? Thanks.

Andereoo commented 7 months ago

Hi! Not at the moment, but I can add it if needed. What are your trying to do?

rodneyboyd commented 7 months ago

Hi. I'm not sure how much detail you need, but basically I have some information stored in attributes that I'd like to be able to access without having to click on the text after finding it.

Andereoo commented 6 months ago

If you're using find_text as a way to filter elements and get their data, I would instead use frame.html.search(css_selector) to get the nodes. For large documents this would be faster and more foolproof than using find_text. For instance,

self.frame.load_html("<p>Some text</p><p wantthis>Some other text<p>")
self.frame.html.search("wantthis")

would return all nodes that match.

However, if for whatever reason you need to get the nodes found strictly by find_text, let me know and I will add it!

rodneyboyd commented 6 months ago

Hi, I do need to use find_text because it's for a user find/replace operation. I'm not sure if I could refactor it to use search(css) instead ... maybe? Btw if you're curious about the app you can download it at https://picardy-indexing.ca/downloads It's an index-editing app. It uses TkinterWeb both for preview and Help delivery.

Andereoo commented 6 months ago

Hi!

No, it is not possible to use search to mimic the functionality of find_text. find_text takes the text content of the website, uses RegEx to find matches, and then finds the corresponding nodes. You could make your own similar function, but there's no need to reinvent the wheel.

I tweaked find_text so you can get the selected node. Adding the argument detailed=True to find_text will cause a tuple with the number of matches, the selected node, and a tuple of all other matches to be returned.

Each match is returned as a tuple of four values. The first is the start node, the second is the text offset index from the start of the node, the third is the end node, and the fourth is the end node offset index. It returns two nodes because some searches could span multiple nodes, so the start node is the node at the beginning of the text that was found and the end node is the node that is at the end of the found text. In most cases these would be the same. The offset indexes are largely internal.

I hope this helps! Let me know if you have any questions.

rodneyboyd commented 6 months ago

Thanks very much! It works as expected and provides the information I need.

By the way, something seems to have changed that causes find_text('') to fail with the following error:

TypeError: cannot unpack non-iterable int object

Andereoo commented 6 months ago

Happy to help!

Thanks for noticing that bug; I just fixed it.

rodneyboyd commented 6 months ago

By the way, is there a reason why tkhtml/Linux/32-bit/Tkhtml3.0.so changed name to libTkhtml3.0.so ? (Ditto for 64-bit.) I don't think it makes any difference, but I initially got an error when building my installer because it was expecting the old name.

Andereoo commented 6 months ago

I renamed some of the files to match the output filename when compiling Tkhtml. I had someone ask why they were named differently and I figured I would rename some of the files used here to save folks having to rename their files after compiling. Sorry about your installer; it never occurred to me it would be an issue!

rodneyboyd commented 6 months ago

No worries ... I figured it out after a few minutes :-)