causal-agent / scraper

HTML parsing and querying with CSS selectors
https://docs.rs/scraper
ISC License
1.81k stars 100 forks source link

Added feature flag `atomic` to make use of atomic `StrTendril` type. #102

Closed jaboatman closed 1 year ago

jaboatman commented 1 year ago

Enabling this feature allows scraper::Html and associated types to implement Send.

cfvescovo commented 1 year ago

Considering this for the next minor release

cfvescovo commented 1 year ago

cc @teymour-aldridge

adamreichold commented 1 year ago

@jaboatman Do you have any experience on the resulting performance impact?

(We currently live with the !Send bound in a multi-threaded async code base, but this repeatedly trips up my co-workers when they add a yield point while a Html is alive. Hence, I am wondering whether this would be worth it just for the ergonomics.)

j-mendez commented 1 year ago

@jaboatman changes look good. I was actually about to start adding these changes. Minor - instead of a util function an interface using “.into” or “from”.

jaboatman commented 1 year ago

@jaboatman Do you have any experience on the resulting performance impact?

(We currently live with the !Send bound in a multi-threaded async code base, but this repeatedly trips up my co-workers when they add a yield point while a Html is alive. Hence, I am wondering whether this would be worth it just for the ergonomics.)

I don't, I would imagine you'd need to benchmark it for your specific use case. We needed to store Html across await points which is why I implemented it. In our setting high performance isn't critical and as such I haven't noticed any problems with using the atomic flag.

j-mendez commented 1 year ago

@jaboatman changes look good. I was actually about to start adding these changes. Minor - instead of a util function an interface using “.into” or “from”.

Can actually just use SendTendril instead. https://github.com/causal-agent/scraper/pull/102/files#diff-af08c3181737aa5783b96dfd920cd5ef70829f46cd1b697bdb42414c97310e13R259 needs an update after the last performance commits if possible.

adamreichold commented 1 year ago

Can actually just use SendTendril instead.

I don't think this will work as SendTendril has a significantly reduced API surface which is the price it pays for being Send without changing the Atomicity type parameter.

jaboatman commented 1 year ago

https://github.com/causal-agent/scraper/pull/102/files#diff-af08c3181737aa5783b96dfd920cd5ef70829f46cd1b697bdb42414c97310e13R259 needs an update after the last performance commits if possible.

Done.

cfvescovo commented 1 year ago

LGTM @adamreichold @j-mendez are there any changes you would like to see implemented?

j-mendez commented 1 year ago

LGTM @adamreichold @j-mendez are there any changes you would like to see implemented?

LG2M thanks for asking.

adamreichold commented 1 year ago

LGTM as well. I am looking forward to the new release! Thank your work maintaining this crate.

j-mendez commented 1 year ago

Thank you for maintaining this crate as well @cfvescovo !

cfvescovo commented 1 year ago

Thank you for your contributions!

OlivierH commented 1 year ago

Thanks for this! Note that this is not currently showing in the docs.