fasheng / elfeed-protocol

Provide extra protocols to make like Fever, NewsBlur, Nextcloud/ownCloud News and Tiny Tiny RSS work with elfeed
GNU General Public License v3.0
98 stars 18 forks source link

Very Slow Sync Performance With Miniflux Fever API #44

Closed rbugajewski closed 1 year ago

rbugajewski commented 3 years ago

elfeed-log

It is very hard to collect any logs, because Emacs does not react during sync, but from what I can see the Fever API is handled correctly. get entries is called repeatedly with different entry IDs. Then at some point retrieve is called which hangs for several minutes:

[2020-12-15 09:27:12] [debug]: retrieve
(http://news.example.com/fever/?api&items&with_ids=91778,79601,95947,95948,80171,80173,80174,80286,80287,80290,79935,86768,80296,80294,95949,82520,80838,94536,97833,76941,104911,91637,86049,84214,56954,82522,82521,81053,95115,92460,77285,91032,80175,80176,97794,90973,80049,79871,84830,84831,83226,83227,79603,79602,84832,97821,84833,84834,85540,106642
http://news.example.com/fever/?api&items&with_ids=81152,103160,80378,85541,85542,89988,102476,117525,84836,84837,106624,113001,102205,85543,92462,95032,102243,82523,80353,80309,80725,79309,86291,79311,80020,103185,93401,106625,84839,79604,84838,98083,95102,72521,85943,103186,74343,79605,79606,90986,91038,91039,91709,94956,101566,82524,95712,112711,131381,78703
http://news.example.com/fever/?api&items&with_ids=97998,106577,112757,97372,100700,112983,112984,80083,98052,88883,112986,95103,112988,133384,119064,85303,95053,95057,99458,66070,95088,129017,84852,84853,92619,95950,83991,133385,91036,131385,112987,94986,80350,107888,80828,125201,84849,84850,84851,112989,86371,100232,86726,99460,79609,99462,99463,79608,80574,84854
http://news.example.com/fever/?api&items&with_ids=84855,84856,134587,106304,99461,60862,84569,84857,84858,84860,84861,103187,93727,91037,84859,106306,78232,103353,103188,107078,106930,108317,81761,84862,72632,103189,84863,78826,92838,76459,103190,86805,84864,94271,80318,103191,103192,93568,103193,79721,77135,110626,92430,91083,80128,103194,131441,82337,110638,99464
http://news.example.com/fever/?api&items&with_ids=81762,124254,129616,85445,103111,121530,79607,112275,110627,95951,109332,112857,111839,92820,92819,84763,112299,86292,51742,80133,91178,44964,99466,80134,80135,112300,112274,121531,103042,105659,87293,83230,80304,90306,89923,86823,83046,82526,82705,84866,83231,90426,112861,87278,80576,108258,90114,107528,87069,81662
http://news.example.com/fever/?api&items&with_ids=79754,67190,90308,87168,90307,107876,84868,84869,82527,108411,78879,82528,98700,48487,79755,82529,82531,82532,91901,78944,90420,84694,89704,90421,110111,129618,113057,91861,82530,90180,97541,84870,91929,79072,82289,103195,79285,85059,84477,110629,90145,90583,103196,86302,103197,84962,89503,44130,90147,78585
http://news.example.com/fever/?api&items&with_ids=77816,108365,90428,82533,85478,89247,106699,82998,91930,106700,82534,84873,91931,87285,105814,84493,83599,84560,94888,98054,99408,83600,80149,82487,99409,79614,99410,87257,84561,90427,98067,76244,99411,99413,82222,78420,89623,99479,77451,79615,58648,112262,84411,89254,91933,91190,82488,87320,103198,90920
http://news.example.com/fever/?api&items&with_ids=103199,125202,89245,80789,80183,82095,107752,112301,76855,106701,79617,79298,81560,110604,89246,88841,91936,81224,90234,106702,79357,112215,82381,86985,91442,77710,90155,91934,91550,91935,41725,80836,89679,79669,78104,86715,80837,91548,87112,86106,87172,84424,99414,103200,80839,80841,135135,91549,80842,173367
http://news.example.com/fever/?api&items&with_ids=90625,84539,106689,99415,99416,98084,80184,78229,77454,110605,87321,89248,81110,93321,80185,110309,112997,125605,106610,84204,90064,141436,89861,108384,110606,80141,108318,80072,118677,101127,89249,89250,88001,87835,108400,85518,89251,84563,89252,102997,89233,86374,113000,89253,92722,95108,95955,99473,89255,87027
http://news.example.com/fever/?api&items&with_ids=95109,101130,101131,101132,77963,98801,90422,90423,90424,90425,79566,110612,110617,110618,101136,110613,110543,101133,101135,90165,78262,98070,93645,110614,112864,110615,108606,98804,80186,106580,104426,133388,78512,78513,94914,90939,94912,94913,83891,133389,82539,83770,82540,110620,80737,78366,92504,141437,110621,110622
http://news.example.com/fever/?api&items&with_ids=106477,141438,79930,110625,101972,92811,110623,110624,82461,82460,101978,141439,101979,90224,171158,83734,110628,99308,94915,94916,94917,94918,111943,92421,90884,141440,82682,110630,110631,110632,99474,78367,141441,96633,99476,132624,74092,79682,94892,82244,148496,124386,101984,94396,99477,68997,99478,78368,110489,80273
http://news.example.com/fever/?api&items&with_ids=90589,122359,92726,106704,89781,82965,99050,98073,82606,79269,130951,117094,119068,57026,59033,91175,99480,141442,78450,90314,141443,86375,101139,101747,86687,90882,90883,79460,102654,75897,101987,124396,129623,105214,102584,95957,86376,91440,91441,108562,84676,110447,84674,95926,84210,84675,95141,90885,84422,92891
http://news.example.com/fever/?api&items&with_ids=82968,91443,90873,91444,94734,84441,91421,92892,106644,76224,91342,29243,84620,110634,92822,133619,106669,94429,133696,80983,133697,91236,82969,97976,80274,89921,79272,79270,80261,99256,81374,83702,98074,110730,75372,110633,91100,108414,113008,78562,80079,113009,103776,98091,102585,82068,97446,78645,110635,129481
http://news.example.com/fever/?api&items&with_ids=92528,92009,82472,92529,79146,77671,85568,87860,92530,87859,91095,79724,80984,80738,79725,92485,94881,84496,85569,88889,80942,84658,89688,78520,85570,132628,90819,85571,85572,99221,101141,117119,74327,82541,78712,82712,79149,98090,76130,85573,85574,96777,92531,88888,83856,88890,83857,88891,80002,83974
http://news.example.com/fever/?api&items&with_ids=85525,91711,90720,115288,102586,87297,102587,102588,102590,102589,124404,102594,101876,82542,76454,90799,112233,148528,106452,124405,101142,101143,109336,119472,94893,76664,102483,98952,102591,102593,92727,101144,112235,112237,112238,91476,73750,83656,72433,84413,101145,79367,99167,79242,81146,77761,91301,87504,106349,109339
http://news.example.com/fever/?api&items&with_ids=108417,78358,80213,106646,101990,124407,98957,94487,99224,80192,88896,102601,88897,80193,108389,102600,76812,102602,78363,79279,79238,88791,78236,78330,83498,124408,102603,91672,99202,101147,80195,94840,94866,124411,124412,79276,77519,88420,80017,88676,79277,79275,79278,78683,78097,82468,93986,89450,92661,99203
http://news.example.com/fever/?api&items&with_ids=80569,80085,91561,79280,92459,83743,78325,79237,91451,90366,83748,119827,99204,107268,113099,109285,124413,79255,101696,79252,79260,101991,78660,82971,141444,82970,94189,89161,106804,119829,109446,77053,81549,79281,81490,92305,83749,88655,101993,76802,76688,82202,79282,91893,118533,101994,77448,77246,78720,106647
http://news.example.com/fever/?api&items&with_ids=78721,124414,93558,141445,78713,74657,91664,94154,107478,109300,91130,109452,84593,87847,80284,92730,99225,79265,79266,78730,87797,99226,94082,101995,91647,82543,79283,99227,68133,86693,89210,77289,90376,91675,84865,99206,91636,74738,85735,89211,76854,83848,82690,78208,88574,99228,93987,91535,76933,91232
http://news.example.com/fever/?api&items&with_ids=80988,85822,94718,99229,99230,85825,95121,90345,101997,86487,80987,157034,148529,119831,85823,82469,119834,80989,81377,96671,97516,158045,96673,96674,157026,107480,82615,91937,86455,99120,75183,82470,107481,107482,100468,119835,99236,107483,95123,74850,90419,93597,119136,119777,80197,100590,105358,119836,81701,80275
http://news.example.com/fever/?api&items&with_ids=95021,90287,119837,113011,93552,89309,89308,78567,97442,78541,89258,80206,89307,78028,89310,80167,82400,76275,89311,123896,113010,77958,82544,74853,112963,117124,78306,83597,96669,119731,106473,81258,68616,78118,87612,93526,118186,81084,86742,81703,85581,96677,106474,91147,88568,67902,102609,131254,110019,124480
http://news.example.com/fever/?api&items&with_ids=96678,102610,102442,134558,106475,131276,91148,118684,91540,81839,76671,93257,88129,91149,93085,98967,80196,88895,106962,107442,119071,89187,67333,109448,90684,107450,116973,113014,109380,99102,109456,82367,111331,117004,103073,99047,85630,94066,95413,105459,53135,92320,89312,89313,119072,94566,96690,127808,124483,89114
http://news.example.com/fever/?api&items&with_ids=131255,107435,109382,95412,99003,99004,124420,113017,99005,99006,89961,101150,79955,99007,99008,113018,113019,98215,80994,113020,118895,84781,91150,92331,62750,90070,62711,89315,106468,93643,124485,124484,82824,91140,111283,124486,101152,95414,109454,109033,80997,99238,80996,85631,109248,92623,95415,95408,95410,95411 

I had to cut off the log output, because GitHub doesn’t allow such large comments.

error backtrace

No error is present, this is a sync performance issue. As far as I understand elfeed keeps one file for each entry. After running elfeed-update with elfeed-protocol integration for the whole night, then doing a wc on ~/.elfeed/data there were about 6000 files/entries.

Miniflux is a a RSS client that focuses on performance. It can easily handle tens of thousands of entries. A typical client syncs my setup in minutes. I expect that a first sync with 100.000 entries should take no longer than 30 minutes.

fasheng commented 3 years ago

Fever API limit maximum article size to 50 for each request(Miniflux origin API has similar limitation). So elfeed-protocol call curl command only fetch 50 articles in each update operation. This should be the performance bottlenecks if there are huge number of unread or starred articles. How ever, if your network is OK, I don't think fetch 1000 articles will take more than 10 minutes.

Besides, if you want to fetch huge number of articles besides the first sync operation, the following code will be useful:

(setq my-elfeed-update-timer
      ;; decrease seconds depend on your network performance
      (run-at-time 15 15
                   (lambda () (when (= elfeed-curl-queue-active 0)
                                ;; use elfeed-protocol-ttrss-update-older instead for older articles
                                (elfeed-protocol-ttrss-update "ttrss+https://user@host")))))
(cancel-timer my-elfeed-update-timer)
rbugajewski commented 3 years ago

Thanks for your fast response.

However, if your network is OK, I don't think fetch 1000 articles will take more than 10 minutes.

This speed would be perfectly acceptable as I could run the initial sync overnight, the incremental syncs will be much faster. Currently I can’t achieve such speeds though, but I added your hints to my configuration which now looks like this and I will run it overnight once again:

(use-package elfeed-protocol
  :after elfeed
  :config
  (setq elfeed-use-curl t)
  (defadvice elfeed (after configure-elfeed-feeds activate)
    "Make elfeed-org autotags rules works with elfeed-protocol."
    (setq elfeed-protocol-tags elfeed-feeds)
    (setq elfeed-feeds (list
                        (list "fever+http://admin@news.example.com"
                              :api-url "http://news.example.com/fever/"
                              :password "p4s5w0rd"
                              :autotags elfeed-protocol-tags))))
  (setq my-elfeed-update-timer
        ;; decrease seconds depend on your network performance
        (run-at-time 15 15
                     (lambda () (when (= elfeed-curl-queue-active 0)
                                  ;; use elfeed-protocol-ttrss-update-older instead for older articles
                                  (elfeed-protocol-fever-update "fever+http://admin@news.example.com")))))
  (cancel-timer my-elfeed-update-timer)
  (elfeed-protocol-enable))

I’m not sure what the bottleneck is, because two other clients I have tested can sync much faster. The sync takes place over a 1 Gbps connection, and I verified that the network bandwidth isn’t saturated. The server’s load is way below its limit. Emacs on the client however takes 100% CPU during elfeed-update.

rbugajewski commented 3 years ago

I’ve tried to check how fast the sync is on my machine and I don’t think that it would finish overnight with this speed:

➜  ~/.elfeed/data for i in {1..6}; do find . -type f | wc -l; sleep 10; done
    1284
    1284
    1284
    1284
    1325
    1325

This is about 50 entries per minute.

This runs on a 2,9 GHz Quad-Core Intel Core i7 (Turbo Boost up to 3,8 GHz). Is there something else I can try to make the sync faster?

fasheng commented 3 years ago

Oh, please ignore the last line in my example code, and the update timer only work for the incremental syncs.

(cancel-timer my-elfeed-update-timer)

This is about 50 entries per minute.

Looks your network bandwidth and computer performance are both fine, maybe the result JSON buffer is too large so emacs parse it slowly and takes 100% CPU all the time. You could catch the curl command line(just ignore prefix args --disable --compressed --silent --location -w) and debug it in the console:

ps aux | grep curl
rbugajewski commented 3 years ago

I’ve now a much faster machine and I wanted to retry my elfeed-protocol setup, but now I’m getting errors on the initial sync:

[error]: http://news.example.com/fever/?api&items&with_ids=260685,266085,324987,266082,266083,266086,266084,339460,283184,189692,264061,276098,270801,324989,294909,292525,308143,294544,262102,266087,284374,308409,325956,263106,266088,275414,299249,294833,308065,266089,278032,292528,275307,276075,262354,262604,266094,266093,266095,300231,271718,266092,262945,263295,264418,266096,264640,266097,262231,262232: "(126) Unknown curl error!"
[error]: http://news.example.com/fever/?api&items&with_ids=271721,271720,356356,278034,270728,278036,270727,278035,275053,287347,270867,260904,261329,325216,294837,260850,261251,330763,261202,278233,262610,262952,264724,266255,261437,271507,289059,262592,262589,260674,238585,284278,260792,260152,263186,287349,287348,261813,276345,286386,265802,263435,259634,264702,264908,268874,289060,263437,313130,287353: "(126) Unknown curl error!"

How can I debug this?

fasheng commented 3 years ago

I'm not sure, how about you run curl directly in bash:

curl -H'User-Agent: Emacs Elfeed 3.3.0' -XPOST -d api_key=$(echo -n 'user:pass' | md5sum | awk '{print $1}') 'https://news.example.com/fever/?api&feeds'
curl -H'User-Agent: Emacs Elfeed 3.3.0' -XPOST -d api_key=$(echo -n 'user:pass' | md5sum | awk '{print $1}') 'https://news.example.com/fever/?api&saved_item_ids'
curl -H'User-Agent: Emacs Elfeed 3.3.0' -XPOST -d api_key=$(echo -n 'user:pass' | md5sum | awk '{print $1}') 'https://news.example.com/fever/?api&unread_item_ids'
curl -H'User-Agent: Emacs Elfeed 3.3.0' -XPOST -d api_key=$(echo -n 'user:pass' | md5sum | awk '{print $1}') 'https://news.example.com/fever/?api&items&with_ids=260685,266085'
rbugajewski commented 3 years ago

I'm not sure, how about you run curl directly in bash:

Thanks for the hint! I’ve cut the responses after some characters to make the comment more readable:

curl -H'User-Agent: Emacs Elfeed 3.3.0' -XPOST -d api_key=$(echo -n 'user:pass' | md5sum | awk '{print $1}') 'https://news.example.com/fever/?api&feeds'

{"api_version":3,"auth":1,"last_refreshed_on_time":1618387652,"feeds":[{"id":8,"favicon_id":0…

Response looks good.

curl -H'User-Agent: Emacs Elfeed 3.3.0' -XPOST -d api_key=$(echo -n 'user:pass' | md5sum | awk '{print $1}') 'https://news.example.com/fever/?api&saved_item_ids'

{"api_version":3,"auth":1,"last_refreshed_on_time":1618387667,"saved_item_ids":"12274"}

LGTM

curl -H'User-Agent: Emacs Elfeed 3.3.0' -XPOST -d api_key=$(echo -n 'user:pass' | md5sum | awk '{print $1}') 'https://news.example.com/fever/?api&unread_item_ids'

{"api_version":3,"auth":1,"last_refreshed_on_time":1618387679,"unread_item_ids":"260685,266085,324987…

LGTM

curl -H'User-Agent: Emacs Elfeed 3.3.0' -XPOST -d api_key=$(echo -n 'user:pass' | md5sum | awk '{print $1}') 'https://news.example.com/fever/?api&items&with_ids=260685,266085'

{"api_version":3,"auth":1,"last_refreshed_on_time":1618387710,"items":[{"id":260685,"feed_id":287,"title":…

This looks also like a correct, authenticated response.

Is there maybe a verbose option so that all curl commands get displayed in the log buffer so that I can debug further as these curl commands don’t seem to be the root of my issue?

fasheng commented 3 years ago

Yes, all the curl commands executed correctly.

curl error 126 means “command not executable", it's strange because elfeed-protocol fetched feeds and unread ids with curl just now which means curl is executable. So I guess your elfeed-protocol-fever-feeds is not empty. And you could execute the following code to check if elfeed-curl-enqueue is executable:

(elfeed-curl-enqueue "http://news.example.com" (lambda (status) (message "%s" (buffer-string))))

Besides, you could show more elfeed-protocol logs with:

(setq elfeed-protocol-log-trace t)
(setq elfeed-protocol-fever-maxsize 5)

I don't use MacOS, you have to debug it yourself. For example you could check if elfeed works without elfeed-protocol and append debug code under elfeed-curl-retrieve, good luck:

(elfeed-log 'debug "curl args: %s" args)
rbugajewski commented 1 year ago

@fasheng Was there some update I’m unaware off that took care of this issue, or what is the reason why this issue was closed as completed without comment?

fasheng commented 1 year ago

I have no idea with the issue, miniflux+fever+curl works fine for me everyday, if long time no update and could not reproduce in my machine, I will close it. If you have any thing new about the debug work that mentioned before, just append comment.

fasheng commented 1 year ago

I'm afraid you must debug it yourself:

  1. check if equivalent curl command in console slowly
  2. check if elfeed that without elfeed-protocol slowly
  3. check if elfeed-protocol with emacs-Q slowly
  4. check if emacs parsing json buffer slowly, that's the main difference with curl command https://github.com/fasheng/elfeed-protocol/blob/b27365ba82b2da095556dbb0084789e96cd54b61/elfeed-protocol-fever.el#L143C31-L143C31
rbugajewski commented 1 year ago

OK, thanks for the clarification, and the input. I appreciate it.