Closed Lanchon closed 8 years ago
Hm, that's interesting. I don't get that with mine. There is no deliberate delaying in the code.
Can you give the artist that you had the trouble with?
hi, thanks!
for instance:
soundscrape -f https://soundcloud.com/anjunadeep/sets/the-anjunadeep-edition
output:
Track already downloaded: The Anjunadeep Edition 105 With James Grant
Track already downloaded: The Anjunadeep Edition 104 with Ryan Davis
Track already downloaded: The Anjunadeep Edition 103 With John Monkman
Track already downloaded: The Anjunadeep Edition 102 With Esteble
Track already downloaded: The Anjunadeep Edition 101 with Luca Bacchetti
Track already downloaded: The Anjunadeep Edition 100 (Part Five) with Daniel Curpen - Live from London
Track already downloaded: The Anjunadeep Edition 100 (Part Four) with Cubicolor - Live from London
Track already downloaded: The Anjunadeep Edition 100 (Part Three) with Jody Wisternoff - Live From London
Track already downloaded: The Anjunadeep Edition 100 (Part Two) with Dom Donnelly - Live From London
Track already downloaded: The Anjunadeep Edition 100 (Part One) with Yotto - Live from London
Track already downloaded: The Anjunadeep Edition 99 With Vincenzo
Track already downloaded: The Anjunadeep Edition 98 With Matthias Vogt
Track already downloaded: The Anjunadeep Edition 97 With Atish
Track already downloaded: The Anjunadeep Edition 96 With James Grant (Live from Anjunadeep in Miami @ Cafeina)
Track already downloaded: The Anjunadeep Edition 95 With Jody Wisternoff
Track already downloaded: The Anjunadeep Edition 94 With Hisham Zahran
Downloading: The Anjunadeep Edition 93 With Cubicolor
[################################] 57611/57611 - 00:01:03
Downloading: The Anjunadeep Edition 92 With Theo Kottis
[################################] 65907/65907 - 00:01:14
Downloading: The Anjunadeep Edition 91 With Heliotype
[################################] 63076/63076 - 00:01:09
Downloading: The Anjunadeep Edition 90 With James Grant
[################################] 113441/113441 - 00:02:04
Downloading: The Anjunadeep Edition 89 With Öona Dahl
[################################] 57080/57080 - 00:01:03
Downloading: The Anjunadeep Edition 88 With Gab Rhome
[################################] 56244/56244 - 00:01:02
there is maybe a 15 to 30 minute delay after each [################################] xxxxx/xxxxx - xx:xx:xx
message and before the next Downloading: xxx
message. no delay if track already downloaded. no other messages produced.
I cannot reproduce this, unfortunately. Those tracks all download fine, one after another.
What operating system/python version are you on? Have you tried
pip install soundscrape --upgrade
?
hmmm...
Linux [...] 4.4.0-21-generic #37~14.04.1-Ubuntu SMP Wed Apr 20 16:33:38 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
already on latest...
$ sudo pip install soundscrape --upgrade
Requirement already up-to-date: soundscrape in /usr/local/lib/python2.7/dist-packages
Cleaning up...
my /etc/host* files do not show any sign of alteration from official mint.
i'm not familiar with python, which one is this package running on?
$ python
python python2.7-config python3.4 python-config
python2 python2-config python3.4m
python2.7 python3 python3m
$ python
Python 2.7.6 (default, Jun 22 2015, 17:58:13)
[GCC 4.8.2] on linux2
while soundscrape is hung sleeping...
$ ps -A | grep soundscrape
19166 pts/5 00:01:17 soundscrape
$ pstree 19166
soundscrape
$ sudo netstat --all --program | grep 19166
$
no output produced.
more info while hung...
$ top -p 19166
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
19166 rod 20 0 164396 59436 40892 D 0.0 0.4 1:29.25 soundscrape
and i found this online "The processes blocked on IO are the ones marked as D in the status column (S column in top)."
$ ps -p 19166 -o wchan
WCHAN
balance_dirty_pages.isra.23
but balance_dirty_pages
is part of the kernel, is it not? wow...
$ vmstat 2
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
r b swpd free buff cache si so bi bo in cs us sy id wa st
...
2 0 2749396 11265420 554052 567312 0 0 0 0 1552 13514 5 2 92 0 0
0 0 2749396 11261948 555884 569104 0 0 0 10 1658 13651 7 2 91 0 0
1 0 2749380 11257980 557748 571008 8 0 8 11454 1744 14251 7 3 90 0 0
1 0 2749380 11254508 559628 572896 0 0 0 0 1722 14069 6 3 91 0 0
0 0 2749380 11250788 561388 574856 0 0 0 0 1625 13081 6 3 92 0 0
2 0 2749380 11247192 563400 576620 0 0 0 2376 1713 14667 6 2 91 0 0
0 0 2749380 11244048 565088 578324 0 0 0 0 1503 12504 6 2 92 0 0
1 0 2749380 11239948 566832 580080 0 0 0 2212 1667 13200 7 3 91 0 0
0 0 2749380 11236536 568656 581824 0 0 0 0 1578 13558 6 2 92 0 0 <-- soundscrape was downloading up to this point
0 1 2749380 11235204 569044 582544 0 0 0 10 822 8247 3 1 87 10 0 <-- soundscrape started waiting from this point onwards
0 1 2749380 11230716 569044 582544 0 0 0 0 869 2328 4 1 84 11 0
1 1 2749380 11230824 569044 582544 0 0 0 0 614 1263 2 1 86 12 0
0 1 2749380 11230824 569044 582544 0 0 0 0 585 1224 2 1 86 12 0
0 1 2749380 11230748 569048 582540 0 0 0 14 637 1354 2 1 85 12 0
0 1 2749380 11230740 569048 582544 0 0 0 0 695 1593 2 1 86 11 0
0 1 2749380 11230632 569048 582544 0 0 0 0 887 1751 3 1 83 13 0
0 1 2749368 11230500 569048 582544 6 0 6 40 679 1643 4 1 83 12 0
0 2 2749360 11208304 569112 588892 4 0 3506 1296 856 2065 3 2 84 11 0
...
the wa
(Wait IO) column jumped from 0 to 10+...
http://www.chileoffshore.com/en/interesting-articles/126-linux-wait-io-problem
$ ps auxf | grep soundscrape
rod 19166 0.6 0.4 136824 70932 pts/5 D+ 13:30 1:42 | \_ /usr/bin/python /usr/local/bin/soundscrape -f https://soundcloud.com/anjunadeep/sets/the-anjunadeep-edition
state D+: D = uninterruptable sleep, and + = in foreground process group. https://www.redhat.com/archives/redhat-install-list/2009-April/msg00007.html
this seems to be a kernel issue, what do you think?
Oh man.. you've gone to deep. I have no idea if any of that is related or not.
Is your system full? Can you give me a df /
?
lol, no my drives are fine! :)
i'm sure this is a kernel issue. and because it started one day out of the blue, i'm pretty sure that if i downgrade my kernel this will stop. i'm not using the recommended linux mint kernel, i'm using one of the latest, i don't remember why. i could just try, but my up times rank in months and i hate to reboot my machine, lol
There used to be something called Splice that would let you change kernels without rebooting. I don't know if that still exists though.
Alternately: no gods, no masters, no uptimes.
what i need to do -because i KNOW (lol) downgrade will work- is to enable tracing in python so that we can see which statement is triggering the issue.
although this is not a bug in your software, maybe that statement could be changed as a workaround for this, without no important negative impact.
so i had a little time... let me first say that i have zero knowledge of python.
i ran this download command to produce a python trace to 'log'. i let it run through the first download, and when finished the script hung, as it always does. i pressed CTRL-C; the ^C code echoed on the screen but the process continued to be hung (because linux process state is 'uninterruptible sleep'). then waited 15 minutes for the process to end. this is the output:
$ python -m trace -t /usr/local/bin/soundscrape -f sashaofficial >log
[################################] 57298/57298 - 00:01:08
^CTraceback (most recent call last):
File "/usr/lib/python2.7/runpy.py", line 162, in _run_module_as_main
"__main__", fname, loader, pkg_name)
File "/usr/lib/python2.7/runpy.py", line 72, in _run_code
exec code in run_globals
File "/usr/lib/python2.7/trace.py", line 819, in <module>
main()
File "/usr/lib/python2.7/trace.py", line 807, in main
t.runctx(code, globs, globs)
File "/usr/lib/python2.7/trace.py", line 513, in runctx
exec cmd in globals, locals
File "/usr/local/bin/soundscrape", line 11, in <module>
sys.exit(main())
File "/usr/local/lib/python2.7/dist-packages/soundscrape/soundscrape.py", line 85, in main
process_soundcloud(vargs)
File "/usr/local/lib/python2.7/dist-packages/soundscrape/soundscrape.py", line 272, in process_soundcloud
id3_extras=id3_extras)
File "/usr/local/lib/python2.7/dist-packages/soundscrape/soundscrape.py", line 368, in download_tracks
artwork_url=track['artwork_url'])
File "/usr/local/lib/python2.7/dist-packages/soundscrape/soundscrape.py", line 956, in tag_file
audio.save()
File "/usr/local/lib/python2.7/dist-packages/mutagen/_util.py", line 98, in wrapper
return func(self, h, *args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/mutagen/_file.py", line 132, in save
return self.tags.save(filething, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/mutagen/easyid3.py", line 183, in save
self.__id3.save(*args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/mutagen/_util.py", line 123, in wrapper
return func(*args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/mutagen/_util.py", line 98, in wrapper
return func(self, h, *args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/mutagen/id3/__init__.py", line 591, in save
insert_bytes(f, new_size - old_size, old_size)
File "/usr/local/lib/python2.7/dist-packages/mutagen/_util.py", line 511, in insert_bytes
file_map.move(offset + size, offset, movesize)
KeyboardInterrupt
i have tried this several times and the stack trace always ends in the same file_map.move
call.
the produced log is 450+ MB in size, so i did:
$ tail -n2200 log >log-tail
$ grep soundscrape log-tail
soundscrape.py(921): if chunk: # filter out keep-alive new chunks
soundscrape.py(922): f.write(chunk)
soundscrape.py(923): f.flush()
soundscrape.py(920): for chunk in progress.bar(r.iter_content(chunk_size=1024), expected_size=(total_length / 1024) + 1):
soundscrape.py(921): if chunk: # filter out keep-alive new chunks
soundscrape.py(922): f.write(chunk)
soundscrape.py(923): f.flush()
soundscrape.py(920): for chunk in progress.bar(r.iter_content(chunk_size=1024), expected_size=(total_length / 1024) + 1):
soundscrape.py(921): if chunk: # filter out keep-alive new chunks
soundscrape.py(922): f.write(chunk)
soundscrape.py(923): f.flush()
soundscrape.py(920): for chunk in progress.bar(r.iter_content(chunk_size=1024), expected_size=(total_length / 1024) + 1):
soundscrape.py(925): return path
soundscrape.py(362): tag_file(path,
soundscrape.py(363): artist=track['user']['username'],
soundscrape.py(364): title=track['title'],
soundscrape.py(365): year=track['release_year'],
soundscrape.py(366): genre=track['genre'],
soundscrape.py(367): album=id3_extras.get('album', None),
soundscrape.py(368): artwork_url=track['artwork_url'])
--- modulename: soundscrape, funcname: tag_file
soundscrape.py(943): try:
soundscrape.py(944): audio = EasyMP3(filename)
soundscrape.py(945): audio.tags = None
soundscrape.py(946): audio["artist"] = artist
soundscrape.py(947): audio["title"] = title
soundscrape.py(948): if year:
soundscrape.py(950): if album:
soundscrape.py(952): if track_number:
soundscrape.py(954): if genre:
soundscrape.py(955): audio["genre"] = genre
soundscrape.py(956): audio.save()
soundscrape.py(989): except Exception as e:
soundscrape.py(370): except Exception as e:
the un-grepped 'log-tail' file is available here.
the issue happens in mutagen's EasyID3 library: soundscrape always hangs in the soundscrape.py(956): audio.save()
line. furthermore, the problem seems to be this line in mutagen's lib.
ID3v2 tags (but not v1 tags) are stored at the beginning of the file, and when you tag an untagged file you have to 'scroll' all the file contents backwards to make space at the beginning.
i am completely guessing here, but this file_map.move
call could be scrolling the complete file backwards, and for whatever reason my kernel is doing this in an extremely inefficient fashion. (it used to not be this way with older kernels.) i have an SSD and 100MB read/writes take fractions of a second, not 15+ minutes.
this looks like a problem in mutagen's ID3 library, or python, or the linux kernel. definitely not an issue of soundcrape. do you have contact with mutagen? they might be very interested in taking over this issue now.
for reference:
$ uname -a
Linux [...] 4.4.0-21-generic #37~14.04.1-Ubuntu SMP Wed Apr 20 16:33:38 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
it looks like python uses C's memmove function to implement mmap's move method: https://github.com/python/cpython/blob/master/Modules/mmapmodule.c#L684
this used to work fine but for some reason became horribly inefficient on recent linux kernels (with ubuntu's C lib).
This is so much deeper than I ever expected to go with a SoundCloud downloader! I think having mutagen be fail-safe using temporary files would be a doubly good improvement.
ok, so mystery solved by the @lazka in just a few minutes, lol!
first, a relevant correction: i was downloading to a fuse-mounted NTFS partition on a rotary harddrive, not to the ext4 SSD as i previously thought i was.
this is a known regression of the 4.2+ kernels (i can confirm regression exists on 4.4) involving mmap writes on fuse-mounted partitions. i was using kernel 3.19 before, which is not affected.
the kernel commit fixing this issue and explaining it in depth is here: https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/commit/?id=4bc9468f1680e799e3036a6e816ed9ecfc7d98a3
hi,
way older versions of soundscrape used to download one song right after the other. at some point in time this changed, and in my setup download pauses for a long time (tens of minutes) after downloading each song from soundcloud and before starting the next one.
is this a rate limiting behavior that soundcloud requires? or is metadata download (ie: artwork, etc) failing in my setup for some reason (eg: host-based ad blocker)? or is something else fishy in my setup?
thank you!
keywords: pause, delay, pauses, delays