Rishikant181 / Rettiwt-API

A CLI tool and an API for fetching data from Twitter for free!
https://rishikant181.github.io/Rettiwt-API/
MIT License
305 stars 31 forks source link

Question about normalization #424

Closed Glagan closed 4 months ago

Glagan commented 5 months ago

Hi, I recently started using this project as a replacement from the official API, and I noticed that newlines where deleted from the text response.
Following the request I saw that the normalizeText function is applied, but I don't understand the point of it ?

https://github.com/Rishikant181/Rettiwt-API/blob/9660117b88fdf8630cad06e86e501441fc31acf9/src/helper/JsonUtils.ts#L55-L65

Why are newlines being replace by dots, I don't think they are "unnecessary characters" as the function description says.

How I can I avoid this function being applied to the text, I didn't see any options to make my own or avoid it being applied, should I directly use Rettiwt-Core and make my own requests instead of using the Rettiwt-API package ?

Thanks.

Rishikant181 commented 5 months ago

Yeah so while I was creating this package as part of one of my projects, I faced an issue where there some tweets will have no full-stop in between two sentences, but would have just a newline in place of a full-stop. This caused me issues while sending the tweets over to Google Cloud NLP for sentiment analysis. So, I just treated newlines as a full-stop and then removed an repeated full-stops that might have been introduced while the removal of newlines.

How I can I avoid this function being applied to the text

Currently, there is no way to avoid to this function using an option, but such an option may be introduced later.

should I directly use Rettiwt-Core and make my own requests

Yeah, you may. However, directly making requests will return you the absolute raw data from Twitter. You will need to then filter out meaningful data from the gibberish that it sends.

I have a question though, in terms of newlines, what's the behavior of Twitter official developer API?. I have personally never used it so I don't know about it :P

Glagan commented 5 months ago

Newlines are just newlines, and I'm not doing any kind of analysis on the text, just display, and I don't really need do be doing anything. I also don't think the official API does anything with them.

I'll just use Rettiwt-Core and filter out the data by copying the most important part from Rettiwt-API, thanks :)

Rishikant181 commented 5 months ago

Thanks for the feedback!

Jeto143 commented 4 months ago

I've just run into this issue, it would be great to have a way to maintain newlines (could also be a different field, or maybe by exposing the legacy data from the retrieved tweet).

Rishikant181 commented 4 months ago

@Jeto143 I think including an optional flag called rawText (or something similar) while creating a new Rettiwt instance will be a good solution.

I'll start working on it tomorrow morning (it's midnight here).

Thanks for the feedback!

Rishikant181 commented 4 months ago

@Jeto143 I just chose to preserve the text formatting from Twitter and removed the text normalizaton I implemented all together.

Expect changes in the next update!

Jeto143 commented 4 months ago

@Jeto143 I just chose to preserve the text formatting from Twitter and removed the text normalizaton I implemented all together.

Expect changes in the next update!

Sounds good, thank you! That was fast 🔥