Closed nsteinmetz closed 8 years ago
The Feeds are put in redis when there is something "new" (new = between the last time the trigger has run, and "today") and it's twice in cache because I put once :
th_rss_' + str(trigger_id)
and once
th_rss_uuid' + str(trigger_id)
the second line permits to provide from TriggerHappy a Feed from an UUID this is usefull when you want to track tweets from a given hashtag - then you could touch that information from the feed built from TriggerHappy.
so I think it's not a bug
Ok so there is an issue with my dotclear atom feeds as they were put in cache each times on the old instance.
On the new one, I don't see anything in fact in redis
Let's close this one so far. Need to understand better what happens...
If you show me the RSS I could analyze. I saw so many feed not well fond, I could find it here too
I keep it open.
I may found a beggining of explanation with issue #129
I made 2 triggers with your atom to create not in evernote
I'm digging now with that goes to cache but does not get out of it.
It's really crazy ...
all of that to improve the perf with multiprocessing ...
It's what I used in fact, but did not change anything in my case.
If I were to implement a clone to th, I think I would:
And as you did, I would have 2 collector / publisher actions.
in fact the data of your feeds were not published because if the database, I've setup a date that was not old enough to permit to let the feed being published. Otherwise it's working .
About your suggestions : I was expecting to be helped by celery/redis for that. May be I should searched for a more sophisticated Queuing system which would trigger the tasks when an entry limit is reached. But I dont know if it would be satisfying with a big quantity of data to handle
Do you really have this quantity or perf requirements ?
As documents could be simple array or hstore or json documents in postgres; one could say mongodb but I would prefer not.
At work we started to use Kafka in a hadoop context, but seems overkill for such a need (having a zookeeper system + kafka nodes). I was also thinking at ElasticSearch but it seems too far from basic use cases.
And hadoop ecosystem has all the features but so overkilled :-D
If you stay with Redis/Celery, maybe you implemented it too fast. Why not start with only redis for example and then implement Celery ?
I don't know enough Redis to say if it's a good choice or not. Seems hashes could be seens as documents.
Or maybe RethinkDB too?
I saw pykafka too, several weeks ago, when I saw a post about IFTTT architecture. And I liked it :) But as you said, adding several java process for a so little project ... :=)
Why redis : because we can use it without thinking of it because, with django we just use two things cache.set() and cache.get() to use the cache system Why Celery : because its simple to trigger several process at once ; what's I want instead of having a serial process.
in fact the data of your feeds were not published because if the database, I've setup a date that was not old enough to permit to let the feed being published. Otherwise it's working .
Not sure to understand your concept of database (redis vs postgres) and "old enough" ; otherwise, it should have worked on the initial one. But as I have dropped everything so far... cannot test back.
When I test feed again and again, I change the date_triggered in the table triggerservices to be a date before the date of the feed.
Regarding Celery message mentionned in #51: the message on Celery mean it does not work ? I thought it was only a simple warning which will not prevent it to work. So simple warn or blocking ? As I saw something happening in the logs, I thought it worked.
Yes it's working , it's an old comment
Ok, so I may have missed something else in the initial app. Never mind.
Hi,
I have three RSS feeds, when I look into redis, I see 4 keys related to rss but each time, it's twice the same content.
Whereas in log:
Feed 3 & 4 are "RSS To Twitter" & "Tweet shared feed from tt-rss nothing" ; which are the same by the way