Closed AlexRast closed 9 years ago
There are a number of files that this depends upon. Can you send me the whole thing as a zip directly?
You've already got it several times, but here it is again (together with some other unnecessary files, but easiest to just bung all the files in the working directory together)
On 08/07/15 10:54, Andrew Rowley wrote:
There are a number of files that this depends upon. Can you send me the whole thing as a zip directly?
— Reply to this email directly or view it on GitHub https://github.com/SpiNNakerManchester/sPyNNaker/issues/107#issuecomment-119528085.
Further testing. I determined that on my desktop machine (hardys) with a completely fresh manual install of the toolchain, the script ran fine. The problem has been on the (small Lenovo) laptop. So, I did a careful, verging on paranoid-obsessive, and utterly thorough cleanup of the entire install on the laptop, then reinstalled using a fresh install just as I did on hardys. Same result - simulation crashes. So we are definitely dealing with some behaviour of the toolchain that's host-machine specific (either in the compile of the application binaries or in the operation of the toolchain subsequently) Ugh.
I have tested this with my installation and it works fine. If this is using git master, I would first suspect your gcc compiler. If you are using a version that is provided by the OS, this could be the issue. In particular, we have seen that gcc version 4.9 or above doesn’t appear to work.
I would advise that you use the pre-packaged gcc described here: https://github.com/SpiNNakerManchester/spinnakermanchester.github.io/wiki/2015.004%3a-Little-Rascal-%3a-1.3-C-Development-for-SpiNNaker#DevelopmentDependencies
This has worked on every system that I have ever tried.
Shouldn't failure to work on more recent versions of gcc be considered a (fairly severe) bug? After all, we are designing the whole system to be able to use gcc and if newer versions break our tools, it's up to us to fix the tools - not to ask a mainstream software system to be downgraded. I very much doubt typical users, either, if using newer versions of gcc, will be happy with being asked to use a specific version.
That said, I will look at what version of gcc is installed in any case. I do have the feeling that in fact it is the one recommended in the appnote but it's worth verifying.
David, a question for you, since of all of us you seem to know the most about this - if newer versions of gcc are crashing the software in the failure mode I'm observing, I would hypothesise that this may be due to different underlying treatment of types in the newer gcc versions. Does this sound plausible or would you venture other hypotheses (that we may be able to examine)?
On 09/07/15 09:35, Andrew Rowley wrote:
I have tested this with my installation and it works fine. If this is using git master, I would first suspect your gcc compiler. If you are using a version that is provided by the OS, this could be the issue. In particular, we have seen that gcc version 4.9 or above doesn’t appear to work.
I would advise that you use the pre-packaged gcc described here: https://github.com/SpiNNakerManchester/spinnakermanchester.github.io/wiki/2015.004%3a-Little-Rascal-%3a-1.3-C-Development-for-SpiNNaker#DevelopmentDependencies
This has worked on every system that I have ever tried.
— Reply to this email directly or view it on GitHub https://github.com/SpiNNakerManchester/sPyNNaker/issues/107#issuecomment-119874313.
Since gcc release versions all the time, we can’t be expected to be up to date with every release as they are made, particularly as they tend to make older versions available as well. This is just one of many dependencies that the software relies upon, and we tend to update as and when necessary.
That said, I am happy for this to be added as a bug to keep track of it, so that we can keep up to date as much as possible. In any case, this isn’t going to be fixed by the time you need it, so I advise making sure that you are using the recommended version at this point.
From: AlexRast [mailto:notifications@github.com] Sent: 09 July 2015 15:36 To: SpiNNakerManchester/sPyNNaker Cc: Andrew Rowley Subject: Re: [sPyNNaker] Processors RTE claiming attempt to configure nonexistent plastic synapses (#107)
Shouldn't failure to work on more recent versions of gcc be considered a (fairly severe) bug? After all, we are designing the whole system to be able to use gcc and if newer versions break our tools, it's up to us to fix the tools - not to ask a mainstream software system to be downgraded. I very much doubt typical users, either, if using newer versions of gcc, will be happy with being asked to use a specific version.
That said, I will look at what version of gcc is installed in any case. I do have the feeling that in fact it is the one recommended in the appnote but it's worth verifying.
David, a question for you, since of all of us you seem to know the most about this - if newer versions of gcc are crashing the software in the failure mode I'm observing, I would hypothesise that this may be due to different underlying treatment of types in the newer gcc versions. Does this sound plausible or would you venture other hypotheses (that we may be able to examine)?
On 09/07/15 09:35, Andrew Rowley wrote:
I have tested this with my installation and it works fine. If this is using git master, I would first suspect your gcc compiler. If you are using a version that is provided by the OS, this could be the issue. In particular, we have seen that gcc version 4.9 or above doesn’t appear to work.
I would advise that you use the pre-packaged gcc described here: https://github.com/SpiNNakerManchester/spinnakermanchester.github.io/wiki/2015.004%3a-Little-Rascal-%3a-1.3-C-Development-for-SpiNNaker#DevelopmentDependencies
This has worked on every system that I have ever tried.
— Reply to this email directly or view it on GitHub https://github.com/SpiNNakerManchester/sPyNNaker/issues/107#issuecomment-119874313.
— Reply to this email directly or view it on GitHubhttps://github.com/SpiNNakerManchester/sPyNNaker/issues/107#issuecomment-120004951.
I agree with Andrew on this, new versions of gcc (and any other complier) will often contain bugs and it’s not realistic to insist that all versions are covered at all times.
M
Michael Hopkins, SpiNNaker project, APT group, School of Computer Science, University of Manchester, Manchester M13 9PL michael.hopkins@manchester.ac.ukmailto:simon.davidson@manchester.ac.uk
On 9 Jul 2015, at 16:08, Andrew Rowley notifications@github.com<mailto:notifications@github.com> wrote:
Since gcc release versions all the time, we can’t be expected to be up to date with every release as they are made, particularly as they tend to make older versions available as well. This is just one of many dependencies that the software relies upon, and we tend to update as and when necessary.
That said, I am happy for this to be added as a bug to keep track of it, so that we can keep up to date as much as possible. In any case, this isn’t going to be fixed by the time you need it, so I advise making sure that you are using the recommended version at this point.
From: AlexRast [mailto:notifications@github.com] Sent: 09 July 2015 15:36 To: SpiNNakerManchester/sPyNNaker Cc: Andrew Rowley Subject: Re: [sPyNNaker] Processors RTE claiming attempt to configure nonexistent plastic synapses (#107)
Shouldn't failure to work on more recent versions of gcc be considered a (fairly severe) bug? After all, we are designing the whole system to be able to use gcc and if newer versions break our tools, it's up to us to fix the tools - not to ask a mainstream software system to be downgraded. I very much doubt typical users, either, if using newer versions of gcc, will be happy with being asked to use a specific version.
That said, I will look at what version of gcc is installed in any case. I do have the feeling that in fact it is the one recommended in the appnote but it's worth verifying.
David, a question for you, since of all of us you seem to know the most about this - if newer versions of gcc are crashing the software in the failure mode I'm observing, I would hypothesise that this may be due to different underlying treatment of types in the newer gcc versions. Does this sound plausible or would you venture other hypotheses (that we may be able to examine)?
On 09/07/15 09:35, Andrew Rowley wrote:
I have tested this with my installation and it works fine. If this is using git master, I would first suspect your gcc compiler. If you are using a version that is provided by the OS, this could be the issue. In particular, we have seen that gcc version 4.9 or above doesn’t appear to work.
I would advise that you use the pre-packaged gcc described here: https://github.com/SpiNNakerManchester/spinnakermanchester.github.io/wiki/2015.004%3a-Little-Rascal-%3a-1.3-C-Development-for-SpiNNaker#DevelopmentDependencies
This has worked on every system that I have ever tried.
— Reply to this email directly or view it on GitHub https://github.com/SpiNNakerManchester/sPyNNaker/issues/107#issuecomment-119874313.
— Reply to this email directly or view it on GitHubhttps://github.com/SpiNNakerManchester/sPyNNaker/issues/107#issuecomment-120004951.
— Reply to this email directly or view it on GitHubhttps://github.com/SpiNNakerManchester/sPyNNaker/issues/107#issuecomment-120021904.
Alex,
The 4.9.0, 4.9.1 are particularly buggy. I believe that the underlying software changes for versions 5.0.0 onwards (though I cannot recommend strongly enough avoiding any software with a final .0 in it!).
D
On 9 Jul 2015, at 16:32, Michael Hopkins notifications@github.com<mailto:notifications@github.com> wrote:
I agree with Andrew on this, new versions of gcc (and any other complier) will often contain bugs and it’s not realistic to insist that all versions are covered at all times.
M
Michael Hopkins, SpiNNaker project, APT group, School of Computer Science, University of Manchester, Manchester M13 9PL michael.hopkins@manchester.ac.ukmailto:michael.hopkins@manchester.ac.ukmailto:simon.davidson@manchester.ac.uk
On 9 Jul 2015, at 16:08, Andrew Rowley notifications@github.com<mailto:notifications@github.commailto:notifications@github.com> wrote:
Since gcc release versions all the time, we can’t be expected to be up to date with every release as they are made, particularly as they tend to make older versions available as well. This is just one of many dependencies that the software relies upon, and we tend to update as and when necessary.
That said, I am happy for this to be added as a bug to keep track of it, so that we can keep up to date as much as possible. In any case, this isn’t going to be fixed by the time you need it, so I advise making sure that you are using the recommended version at this point.
From: AlexRast [mailto:notifications@github.com] Sent: 09 July 2015 15:36 To: SpiNNakerManchester/sPyNNaker Cc: Andrew Rowley Subject: Re: [sPyNNaker] Processors RTE claiming attempt to configure nonexistent plastic synapses (#107)
Shouldn't failure to work on more recent versions of gcc be considered a (fairly severe) bug? After all, we are designing the whole system to be able to use gcc and if newer versions break our tools, it's up to us to fix the tools - not to ask a mainstream software system to be downgraded. I very much doubt typical users, either, if using newer versions of gcc, will be happy with being asked to use a specific version.
That said, I will look at what version of gcc is installed in any case. I do have the feeling that in fact it is the one recommended in the appnote but it's worth verifying.
David, a question for you, since of all of us you seem to know the most about this - if newer versions of gcc are crashing the software in the failure mode I'm observing, I would hypothesise that this may be due to different underlying treatment of types in the newer gcc versions. Does this sound plausible or would you venture other hypotheses (that we may be able to examine)?
On 09/07/15 09:35, Andrew Rowley wrote:
I have tested this with my installation and it works fine. If this is using git master, I would first suspect your gcc compiler. If you are using a version that is provided by the OS, this could be the issue. In particular, we have seen that gcc version 4.9 or above doesn’t appear to work.
I would advise that you use the pre-packaged gcc described here: https://github.com/SpiNNakerManchester/spinnakermanchester.github.io/wiki/2015.004%3a-Little-Rascal-%3a-1.3-C-Development-for-SpiNNaker#DevelopmentDependencies
This has worked on every system that I have ever tried.
— Reply to this email directly or view it on GitHub https://github.com/SpiNNakerManchester/sPyNNaker/issues/107#issuecomment-119874313.
— Reply to this email directly or view it on GitHubhttps://github.com/SpiNNakerManchester/sPyNNaker/issues/107#issuecomment-120004951.
— Reply to this email directly or view it on GitHubhttps://github.com/SpiNNakerManchester/sPyNNaker/issues/107#issuecomment-120021904.
— Reply to this email directly or view it on GitHubhttps://github.com/SpiNNakerManchester/sPyNNaker/issues/107#issuecomment-120036018.
The solution we need to move towards adopting is a configure, before make. This ensures that only permitted combinations of utilities are used.
D.
On 9 Jul 2015, at 16:08, Andrew Rowley notifications@github.com wrote:
Since gcc release versions all the time, we can’t be expected to be up to date with every release as they are made, particularly as they tend to make older versions available as well. This is just one of many dependencies that the software relies upon, and we tend to update as and when necessary.
That said, I am happy for this to be added as a bug to keep track of it, so that we can keep up to date as much as possible. In any case, this isn’t going to be fixed by the time you need it, so I advise making sure that you are using the recommended version at this point.
Yes, we can do a version check during the make in fact, especially if we know certain versions are not going to work. We already check that SPINN_DIRS is defined, so this is just another of these.
From: dr-david-lester [mailto:notifications@github.com] Sent: 09 July 2015 17:14 To: SpiNNakerManchester/sPyNNaker Cc: Andrew Rowley Subject: Re: [sPyNNaker] Processors RTE claiming attempt to configure nonexistent plastic synapses (#107)
The solution we need to move towards adopting is a configure, before make. This ensures that only permitted combinations of utilities are used.
D.
On 9 Jul 2015, at 16:08, Andrew Rowley notifications@github.com<mailto:notifications@github.com> wrote:
Since gcc release versions all the time, we can’t be expected to be up to date with every release as they are made, particularly as they tend to make older versions available as well. This is just one of many dependencies that the software relies upon, and we tend to update as and when necessary.
That said, I am happy for this to be added as a bug to keep track of it, so that we can keep up to date as much as possible. In any case, this isn’t going to be fixed by the time you need it, so I advise making sure that you are using the recommended version at this point.
— Reply to this email directly or view it on GitHubhttps://github.com/SpiNNakerManchester/sPyNNaker/issues/107#issuecomment-120053086.
That seems like a good way of solving the problem for the foreseeable future.
On 09/07/15 17:25, Andrew Rowley wrote:
Yes, we can do a version check during the make in fact, especially if we know certain versions are not going to work. We already check that SPINN_DIRS is defined, so this is just another of these.
From: dr-david-lester [mailto:notifications@github.com] Sent: 09 July 2015 17:14 To: SpiNNakerManchester/sPyNNaker Cc: Andrew Rowley Subject: Re: [sPyNNaker] Processors RTE claiming attempt to configure nonexistent plastic synapses (#107)
The solution we need to move towards adopting is a configure, before make. This ensures that only permitted combinations of utilities are used.
D.
On 9 Jul 2015, at 16:08, Andrew Rowley notifications@github.com<mailto:notifications@github.com> wrote:
Since gcc release versions all the time, we can’t be expected to be up to date with every release as they are made, particularly as they tend to make older versions available as well. This is just one of many dependencies that the software relies upon, and we tend to update as and when necessary.
That said, I am happy for this to be added as a bug to keep track of it, so that we can keep up to date as much as possible. In any case, this isn’t going to be fixed by the time you need it, so I advise making sure that you are using the recommended version at this point.
— Reply to this email directly or view it on GitHubhttps://github.com/SpiNNakerManchester/sPyNNaker/issues/107#issuecomment-120053086.
— Reply to this email directly or view it on GitHub https://github.com/SpiNNakerManchester/sPyNNaker/issues/107#issuecomment-120057825.
This has been determined to be a problem with the [Threading] option in spynnaker.cfg. The faulty machine had dsg_threads = 5. Successfully reproduced the bug on other machines with dsg_threads = 5. Should be investigated but for now the workaround is to set dsg_threads = 1.
More info: The affected file is spinnaker.py. dsg_threads is used (in theory) to set up a pool of dsg's in line 794:
thread_pool = ThreadPool(processes=no_processors)
where no_processors is the number specified in dsg_threads. Unfortunately the ThreadPool API (multiprocessing.pool.ThreadPool) is undocumented. There is an open Python bug (17140) on this lack of documentation. Hence its behaviour is unclear. The bug report suggests that the ThreadPool probably behaves similarly to the (mostly documented) Pool object. Even here however there is an ambiguity. thread_pool itself is used to run apply_async (line 820):
thread_pool.apply_async(data_generator_interface.start)
but the documentation for apply_async is unclear. Specifically, it claims:
" apply_async(func[, args[, kwds[, callback]]])
A variant of the apply() method which returns a result object.
If callback is specified then it should be a callable which accepts a single argument. When the result becomes ready callback is applied to it (unless the call failed). callback should complete immediately since otherwise the thread which handles the results will get blocked.
" and the documentation for the apply() method claims:
" apply(func[, args[, kwds]])
Equivalent of the apply() built-in function. It blocks until the result is ready, so apply_async() is better suited for performing work in parallel. Additionally, func is only executed in one of the workers of the pool.
" Thus is func executed in one or all the workers in the pool, in the case of apply_async? The documentation is not explicit on this point and the bug report again indicates further uncertainty in any case. Recommend that this feature be disabled therefore until Python gives us clear documentation on what to expect!
Hi Alex, we acutally kinda did disable it with the threads =1.
We discovered that the dsg exeuction with threads was actually slowing us down (so counter intirutitve, i know, but it seems that the effort of the interpreater saveing and swithcing state was mroe than enough of a drain to make it not worth while).
By turning it to 1, we basically become serial again.
I hope that helps you out, and at least should make you feel more confortable to reduce threads to 1, given that higher numbers gibve you a drop in performance.
Alan
You can also disable it more terminally by merging this guy into your working branch:
https://github.com/SpiNNakerManchester/sPyNNaker/commit/3f7e0226210a0aac9103089eb63e4b68904c3426
so im just going through the open issues and seeign if i can fix anything. Given it seems the orginial issue covered here was deduced to be a thread issue and has a solution. I think its worth closing this issue. Please reopen if you disagree,
This issue is happening for me as well. I would bet fairly large amounts of money that threading in the DSG has nothing to do with it as it LOOKS like memory trampling coming from something buffered-in related as it's triggered by simply changing the input pattern presented to my network via buffered in.
Even when the network does work, I get screeds of these errors at the end:
SpinnmanInvalidPacketException: Invalid packet of type <class 'spinnman.messages.eieio.command_messages.eieio_command_message.EIEIOCommandMessage'> received: The command packet is invalid for buffer management: command id 64 File "/local/knightj/spinnaker_git/sPyNNaker/spynnaker/pyNN/buffer_management/buffer_manager.py", line 131, in receive_buffer_command_message "command id {0:d}".format(packet.eieio_header.command)) Packet Callback Error:Invalid packet of type <class 'spinnman.messages.eieio.command_messages.eieio_command_message.EIEIOCommandMessage'> received: The command packet is invalid for buffer management: command id 64
This should be fixed in master – there was something printing buffer overflows to IO_STD. The current tag allocation allows tag 0 to be used for things other than IO_STD, so I changed the print to use IO_BUF (we are using this for everything else). There is a case to be made for disallowing tag 0 in tag allocation, but this is a minor issue.
just a thought, if a model is using IO_STD, it should declare so, for then the tag allocator could allocate and the issue would disappear. Just out of curiosity, what was/is causing a buffer overflow? these buffers are being defined at host and transmitted, my instinctive reaction is that things shouldnt be overflowing?
I think not allocating tag zero would be the KISS answer to this (https://en.wikipedia.org/wiki/KISS_principle). Sadly the buffers in question are spike buffers not buffered in buffers
Yes, the message was related to the overflow of the spike inputs i.e. too many spikes are coming in to be processed. Note that the buffer doesn't actually overflow; no memory is used that shouldn't be. The extra spikes are just discarded.
I would be quite happy to reserve tag 0 as this is a system function. It is made more complicated by the fact that you don't need to set up tag 0 i.e. you can use IO_STD without having set up anything to listen for it. So reserving tag 0 is the easy option.
Whereas i would usually agree with keeping things simple, we are trying to build a software which has a minimum of built in assumptions, and thus KISS behaviours should be discussed.
Given we are moving to iobuf instead of using iosnd, limiting the resource allocator to ignore a tag just for a issue that is then hidden, would then make it more difficult to deduce in future. think a student 3 years from now wondering why we don't use tag 0. Having a way to explicitly define this would then allow the allocator to work, and would reveal this behaviour in a friendly way.
assume in the future a application which wants all 8 tags, but is not allowed because tag 0 has been allocated to system, but isn’t being used. I'd consider that a bug. smart resource allocation should be aimed for, not disregarded just because its easy to cut it out.
its also worth noting, what happens if iostd changes to use tag 2, then we need to switch the hard-coded allocation. giving the software the chance to handle it would be cleaner in my opinion.
just a side topic that's just popped into my mind. we're using/ed iobuf/iostd to record this overflow. Would it be better to store this in a variable tracking the amount of spikes that we lost through overflow and write it to a register or a provenance region at the end of execution? The reason I've just thought of it, is because this is one of the entry places where packet loss can be recorded.
Agreed - I think all of this type of stuff should be moved to the 'provenance' region, reading the IO_BUF is hardly ideal.
On 7 August 2015 at 11:26, Alan Stokes notifications@github.com wrote:
just a side topic that's just popped into my mind. we're using/ed iobuf/iostd to record this overflow. Would it be better to store this in a variable tracking the amount of spikes that we lost through overflow and write it to a register or a provenance region at the end of execution? The reason I've just thought of it, is because this is one of the entry places where packet loss can be recorded.
— Reply to this email directly or view it on GitHub https://github.com/SpiNNakerManchester/sPyNNaker/issues/107#issuecomment-128667933 .
Believe this to be fixed now - was an issue with the config file
Another new problem with the well-known visual attention network. In the latest version of the toolchain, if you try and run this script, various processors run-time-error. An iobuf in ybug reveals many messages of the form "...there should be no plastic synapses!...". The problem happens every time you try to run the script. Note that in the failing run plasticity_on is set to False so there should be no plastic synapses instantiated.
Quick fix needed; we will need this network for a workshop in Sestri Levante at the end of July - absolute last date to fix 18 July.
Script below, not including all the auxiliary files (but most in the core dev team will already have these support files anyway)