Closed swuckt closed 1 year ago
What's the creator name and are you sure that the creator didn't just remove all that other content, or maybe you were subscribed to the creator before and during the scraping process with 0.4 you weren't
These numbers are taken from the same day (today), maybe 45 minutes apart, so issues with being subscribed or content being removed shouldn't apply. I had this issue the entire month when I was still using the Python source, but was waiting to see if the latest release would fix it.
I can confirm that Message content hasn't been removed. Timeline Previews, I can't honestly confirm for you yet, but I can take the time to do that if you like.
It's mostly the messages that I'm after. v0.4 says it finds 58 message media that can be downloaded. Among them, only 3 out of 6 pictures that were sent to me.
But v0.3.5 detects all 142 message media and is able to get all 6 pictures.
I had this issue the entire month when I was still using the Python source, but was waiting to see if the latest release would fix it.
My god you're so annoying, I waited like 2 extra weeks for people to just report me issues like these. Then after I released a compiled executable (which is 10x the effort, without being able to digitally sign a executable file), you come reporting a bug you could've reported like 3 weeks ago when I first initially released the 0.4 version as python source. I can't believe you expect me to fix a bug I don't even know of and then the audacity you have to even come here and type this issue ticket out is insane.
My god you're so annoying, I waited like 2 extra weeks for people to just report me issues like these. Then after I released a compiled executable (which is 10x the effort, without being able to digitally sign a executable file), you come reporting a bug you could've reported like 3 weeks ago when I first initially released the 0.4 version as python source.
Hey, I'm sorry. I wasn't intending to cause you distress. The nuances and difficulties of OSS development are unknown to me.
Clearly, I've messed up somewhere. I hadn't realised you were working on a certain timeline and needed issues reported ASAP. The delay is because I thought the issue was on my end, so I've been trying different machines and configurations infrequently. I didn't want to open up an issue unnecessarily.
I respect your work and commitment to updating this repo beyond what is expected.
I can't believe you expect me to fix a bug I don't even know of and then the audacity you have to even come here and type this issue ticket out is insane.
This is not my expectation at all. I have not made any demands and I am not here demanding a fix. Most of all, I'm absolutely not expecting you to solve all the problems on your own.
I thought this would be a safe place to open up a discussion. So far, I have only provided information.
I was hoping to get some information back.
But I'm feeling a little defensive right now. I'm going to take a deep breath and return when my head is clear.
Hey please check if this commit https://github.com/Avnsx/fansly-downloader/commit/115d549204268db8a5bf6d7bd5b425c8add1d71c fixes the issue you mentioned @swuckt
You can run the latest commit by installing the python version of fansly downloader: https://github.com/Avnsx/fansly-downloader#python-version-requirements
@swuckt Can you please respond with a simple "yes it fixes the messages bug" or "no, the bug is still not fixed for me", the heart emojis really don't help me fix all possible bugs atm.
No, the bug is still not fixed for me
Okay so after downloading and utilising the latest commit, how did the download numbers change for version 0.4?
What creator are you even using Fansly Downloader on, to validate that there's less content being downloaded?
Can you maybe cut it down to a specific section having less content downloaded or is it just generally downloading less?
Can you provide me examples of media that the download is missing out on?
To help me verify that this is a genuine bug:
download_mode
to Single
in the configuration file and attempt to download the content from that post. If the message states "1. duplicate declined," it means you have already downloaded the post previously. If it successfully downloads the content, it indicates that the downloaders functionality is working correctly. The only situation where a bug would be confirmed is if the message states "no scrapable media found" for the post you attempted to download from (and you can visibly see, that there's media attached to that post).AvnDev@protonmail.com
Since opening this issue, I've received and sent more pictures and videos.
I tried downloading using your new commit, and compared the contents of the ../Messages/Pictures folders.
The main difference is that earlier images (which were downloaded by v0.4) are no longer there. They've been pushed out by newer images.
This makes me think there is a datetime problem or maybe a limit on how far back you can look into messages for media.
When the program starts, the scrapable media count is already wrong. From my earlier counts above, it's less than half of what it should be.
After I get home, I'm going to add some output lines to see how accessible_media
, contained_posts
, and parse_media_info()
work. I'll have a better idea then.
For messages it would be interesting if you had a long enough message history with someone that could verify this thesis:
https://github.com/Avnsx/fansly-downloader/blob/2e85993d9ae02c09f1d66f6388b801f296f6b1e0/fansly_downloader.py#L1303-L1305
In version 0.3.5, I would just iterate over each content messages page in steps of 50, but after I realised they removed the max limit integer for limit
(sadly only for messages) I just set it to 9999 and expected it to all be downloaded within a single iteration. Which works well for me.
Regarding your thoughts about the datetime might be the problem, here are some possibly influential factors:
In version 0.4 I decided to convert the timezone reported by fansly, to the datetime reported by the local systems timezone: https://github.com/Avnsx/fansly-downloader/blob/2e85993d9ae02c09f1d66f6388b801f296f6b1e0/fansly_downloader.py#L407-L429 I am using 24 hour format on my device and I wonder if my code above does properly work for people that don't.
parse_media_info()
is a absolute bugfest. I am really bad at efficiently parsing the json API responses in any programming language & on top of that the fansly API is very unhandy and kind of randomly structured for various types of media & has alot of bugs, which made it even harder for me to properly & efficiently parse what I am looking for, in the API responses.
Fansly's API does in general not report the correct timestamps, so I am switching inbetween updatedAt
and createdAt
multiple times: https://github.com/Avnsx/fansly-downloader/blob/2e85993d9ae02c09f1d66f6388b801f296f6b1e0/fansly_downloader.py#L899-L913 If media reports wrong timestamps, it's most likely because it came from parsing updatedAt
, I am doing that additionally because just using createdAt
did not manage to provide unique enough filenames, so files would start overwriting each other. Maybe this is still a bug? Check if the output of Fansly Downloader is actually reporting the media IDs that are missing, but within the final download folders that media content is not existent (that would mean it has been overwritten with another file).
But yes if anything would be bugged out, it would most likely require a fix in parse_media_info()
-> parses api responses or sort_download()
-> downloads the media based on what parse_media_info()
reports. Every section of fansly (Timeline, Messages, Collections etc.) is tunneled through those two functions.
Also I feel like for some reason the 0.4 version behaves differently on everyones device and I can't figure out why. There's things that just clearly work for me, but don't work for others e.g.: https://github.com/Avnsx/fansly-downloader/discussions/109 & https://github.com/Avnsx/fansly-downloader/issues/105#issuecomment-1594985511, https://github.com/Avnsx/fansly-downloader/issues/101#issuecomment-1589315921
In version 0.3.5, I would just iterate over each content messages page in steps of 50, but after I realised they removed the max limit integer for
limit
(sadly only for messages) I just set it to 9999 and expected it to all be downloaded within a single iteration. Which works well for me.
This is probably the cause of the my issue. I may be past that limit. I printed out post_object['messages'][-1]
to get the oldest message, and it's definitely not the first message I sent or received.
Thanks for sharing that!
Guess I need to add something like this after the first iteration to move the cursor back.
Can you in version 0.4 set download_mode
to Messages
and then after this line: https://github.com/Avnsx/fansly-downloader/blob/2e85993d9ae02c09f1d66f6388b801f296f6b1e0/fansly_downloader.py#L1305
Add:
from pprint import pprint
reachable_media = messages_req.json()['response']['accountMedia']
# pprint(reachable_media, indent=4, width=100)
print('\nRequested url:', messages_req.url)
print('\nTotal length of reachable items:', len(reachable_media))
print('\nThe most distant message in the past:', get_adjusted_datetime(reachable_media[-1]['createdAt']))
print('\nRequest Status Code:', messages_req.status_code)
print('\nResponse Headers:', messages_req.headers)
exit()
save the python file and then run the code with those changes on a creator (you've to change the Username
variable in config.ini) who you have the most messages (that contain content and reach far into the past) with.
Then copy paste the output here, letting me know when the first message that contained media content was actually at and what date it said that it would be in the python output.
Also it's important that it says status_code 200 and reports about as much content as you actually have in there.
Further more, you can uncomment (remove #
) the # pprint(reachable_media, indent=4, width=100)
and it will actually show you the whole thing that it can parse, whereas the further in the past being content should be basically at the very bottom of the python output and the very latest media content should be at the very top of the python output.
Requested url: https://apiv3.fansly.com/api/v1/message?groupId=534534320386322432&limit=9999
Total length of reachable items: 62
The most distant message in the past: 2023-05-14_at_15-21
Request Status Code: 200
[...] letting me know when the first message that contained media content was actually at and what date it said that it would be in the python output.
I can't remember the exact date, but transaction history puts it at March 24th 2023, which doesn't match the Python output.
it will actually show you the whole thing that it can parse
If I understand what I'm seeing, is this roughly the same as post_object['accountMedia'][-1]
? Except with media in different resolutions.
Do you not maybe have a chat history with some random creator, who you happend to follow back in 2021 / 2022 and ever since then the person kept sending you those spammy messages from time to time?
Because what you posted before is not distant in the past enough and I can't tell if you actually only got 62 media items during that time span or not. And the most distant timestamp thing is not 100% accurate because fansly doesn't report timestamps correctly anyways, so there might be like a couple month difference, which is why I need you to try on someone as before explained which dates back to other years.
save the python file and then run the code with those changes on a creator (you've to change the
Username
variable in config.ini) who you have the most messages (that contain content and reach far into the past) with.
I think I am not understanding the conditions you want me to test.
Is it more important that there are lots of messages, or that it reaches far back into the past? Or both?
If you are trying to test the 9999 limit, then time wouldn't matter, since, theoretically, it's possible to send 9999 messages within 1 week. Or I could also be misunderstanding what the limit means.
Do you not maybe have a chat history with some random creator, who you happend to follow back in 2021 / 2022 and ever since then the person kept sending you those spammy messages from time to time?
I opened my account March 19 this year.
I can't tell if you actually only got 62 media items during that time span or not.
That sounds correct for the time frame. Note that the media item count includes outgoing pictures and video too. I'm going to browse the response object and cross reference it with known dates, and see if I can verify or disqualify the timestamps.
You should correct my thinking here, but my interpretation of the numbers is as follows:
messages_req.json()['response']['accountMedia']
cannot reach back far enough into the past, and the response only includes past messages starting in May.Total length of reachable items
should decreaseI have a busy day, so my next response will be very late.
Can you switch to the python source I uploaded into this repository? It basically reverts the messages change i did for 0.4 and replicates how it was in v0.3.5: https://github.com/Avnsx/test-repository/blob/main/fansly_downloader.py
Just press the "Copy raw file" button and paste it into your current python version of 0.4 and let me know if that fixes your issue or not.
Also why did you randomly point out in first place that the messages change, was your cause of the issue, if you initially named the issue ticket "Latest v0.4 release finding less media than v0.3.5"
and pasted stats where previews would also download way less.
... let me know if that fixes your issue or not.
The Messages issue is fixed. Thank you!
It may even be performing better. It's able to find an Audio media that Scraper 0.3.5 couldn't find, even though it existed back when I opened the issue.
(btw you changed the capitalization of the file and your link is giving me a "404 - page not found", but I figured it out)
Also why did you randomly point out in first place that the messages change, was your cause of the issue...
A few reasons:
It seemed the most natural way to do it at the time. If the newest release doesn't match the features of the previous one, that seems to be an appropriate issue to raise.
The fact is, 0.4 finds less media. If it happens for me, it may be an issue for all users. My personal preference for Messages shouldn't stop me from reporting the Timeline Previews issue.
Or do you mean I should separate it into two issues?
That's not to say I don't care about the Timeline Previews issue. But, one thing at a time.
Or do you mean I should separate it into two issues?
Yes, that is what I meant.
Ok so scraping from messages is fixed, but there's still less previews being downloaded from timeline?
Can you name a creator for which I can verify & debug this with?
Or better give me the post ids, which contain previews that you could download with 0.3.5, but can't in 0.4
You're aware that I can't fix the previews downloading issue if you don't tell me the creator name right? @swuckt
Thanks for your patience. I had busy work days.
I'll update this post with the other details if possible.
User is sexyflo,werwater
Can't replicate any downloading issues, but neither do I think that's the correct username that you were originally complaining about, because in your initial issue message you mentioned completly different numbers for downloaded content in 0.4 vs 0.3.5.
Regardless I couldn't care less if you name me the correct creator name or not, you're the one that is going to not be able to download content afterwards. Other people used the scraper on a bunch of creators too and no one said anything about preview content missing in timeline. Just don't come back to me in a month saying, you knew about this bug before, but magically expected it to get fixed.
Finally I released various commit which introduce a new module called rich
(need to install with pip install rich
), it is used to display loading bars, specifically on content that is bigger in file size now. Additionally it fixes various bugs .
Would be nice of you, if you downloaded the latest python version and helped me test it.
I'll have the details later today.
I'd be happy to test it. Is there anything specific you want done?
@Avnsx
That sent me on a journey.
Things I tried:
Link to spreadsheet: https://docs.google.com/spreadsheets/d/1fMRbrjwhKNKQJypq0h1PbGkbc7C3VP1TxKIZuHLmea8/edit?usp=sharing
From the spreadsheet, and from browsing the downloaded files by hand, here are my findings about v0.4:
Unrelated findings/Separate Issues:
Next steps:
Please tell me you used the python 0.4.1 version by downloading it as a zip from the repository and not the 0.4 version linked within the releases page. The 0.4 version was outdated at this point, I had fixed a similar issue where some post content would not be downloaded.
The dates being wrong is also whatever, I don't really care. But I do believe you when you say that 0.3.5 had more accurate dates, as I was mentioning before I am not exactly the best at parsing json responses.
- A handful of newer videos downloaded with v0.4 are not hashed (see spreadsheet > TimelineVideos, column L)
That's normal, .m3u8 videos do not get hashed as they're being downloaded, due to the way .m3u8 videos are structured (they come in ts chunks and fansly downloader utilises the local GPU to transcode & merge them together into a actual .mp4 video). They do get hashed & the hashes are appended to filenames, the second time you start fansly downloader to update a previous download folder.
- Still have the issue where .jpegs are downloaded as .pngs
This is the only interesting issue to me, because that should've been fixed in commit: https://github.com/Avnsx/fansly-downloader/commit/6b0c8d56f54145ea87002ea15f506ca933660d1d Can you point out a post ID which still has images downloading as videos?
Finally I'm closing this issue ticket, because as far as I comprehend the situation;
Today, testing on the same creator, I get different results with v0.3.5 versus v0.4
Scraper v0.3.5
Reported Total 986 Pictures 259 Videos 203 Duplicates declined Breakdown of Totals Messages: 97 Pictures 45 Videos Timeline: 614 Pictures 136 Videos Timeline Previews: 275 Pictures 78 Videos
Downloader v0.4
Reported Total 890 Pictures 193 Videos 2 Duplicates declined Breakdown of Totals Messages: 34 Pictures 24 Videos Timeline: 808 Pictures 139 Videos Timeline Previews: 48 Pictures 30 Videos