SeiictyUsui / pydicom

Automatically exported from code.google.com/p/pydicom
0 stars 0 forks source link

Write vs Read speed [originally related to Decimal slow in 0.9.7] #114

Open GoogleCodeExporter opened 8 years ago

GoogleCodeExporter commented 8 years ago
On a large RT structure set file, I noticed a huge difference between read and 
write performance: well under 1 s on read and around 40 s on write. I have 
attached the file and a test script.  

Original issue reported on code.google.com by patricem...@gmail.com on 9 Apr 2012 at 8:23

Attachments:

GoogleCodeExporter commented 8 years ago
I've tried and confirmed the issue (in fact the time spread is far worse on my 
system).

Some preliminary profiling confirms my suspicion that it is not really the 
writing, it is in the converting of 'raw data elements', and in particular ones 
with VR of DS (which the example file has a _lot_ of) in pydicom 0.9.7. Python 
Decimal is pretty slow compared with float [1]. There is a related (I think) 
issue raised on the dicompyler group [2].

I'm going to look into a way to go back to using float rather than Decimal for 
DS, and perhaps only check and warn on writing the values, rather than when 
setting them.

Another solution, which should be done anyway, is to not convert raw data 
elements before writing them. Then only those data elements touched by the user 
code would be hit with this speed penalty.

[1] http://stackoverflow.com/questions/195116/python-decimal
[2] 
http://groups.google.com/group/dicompyler/browse_thread/thread/217e3d304f55cac2

Original comment by darcymason@gmail.com on 9 Apr 2012 at 10:13

GoogleCodeExporter commented 8 years ago
Changed title to point out connection to Decimal issues.

Original comment by darcymason@gmail.com on 16 Oct 2012 at 7:13

GoogleCodeExporter commented 8 years ago
I recently came across this issue too and found that the issue can be mitigated 
somewhat by using the cdecimal[1] library which is a drop in replacement for 
the decimal library in the std library.  The write operation from Patrice's 
test script is still quite slow, but using the cdecimal routine cuts the run 
time by a factor of 2 on my (slow) work machine (100s vs 45s).

To use the cdecimal module in place of the decimal module, simply add:

import sys
sys.modules["decimal"] = __import__("cdecimal")

*before* you import pydicom (or any other module that makes use of decimal).

This doesn't eliminate the issue entirely but I thought I'd mention it in case 
it is of any use to someone.

[1]http://www.bytereef.org/mpdecimal/index.html

Original comment by randle.taylor on 17 Oct 2012 at 5:30

GoogleCodeExporter commented 8 years ago
DS conversion from raw has been improved by approx a factor of five, in the few 
revisions up to revision d78174a8d9de. This is based on the raw convert part of 
the profiling tests in raw_convert_test.py (test/performance subdirectory).

The speed improvements included taking DS back to a float-based representation 
(by default), although the (pydicom 0.9.7) DS based on Decimal can still be 
switched to if desired. It is about 3.5 times slower, but using randle's 
cdecimal trick it comes down to about 1.8 times slower than the float 
representation in my testing.

I've left the issue open to see if further profiling can still help.

Original comment by darcymason@gmail.com on 23 Oct 2012 at 3:17

GoogleCodeExporter commented 8 years ago
More improvements to revision 78ba350a3eb8. Now approx factor of 10 speed 
improvement for raw_convert_test, compared with pydicom 0.9.7 release. Is 
approx 15-fold if just directly force raw_convert (using dataset.walk()) rather 
than through str(dataset).

There is probably little else that can be done to improve this, except to avoid 
the conversion entirely, which may be picked up again later, so I'll leave the 
issue open for that.

Original comment by darcymason@gmail.com on 24 Oct 2012 at 2:00

GoogleCodeExporter commented 8 years ago
I have just updated to version 0.9.7 and noticed this issue but I noticed it in 
read (I don't write out the data later.)  For example, in the test.py code that 
is above if you add repr(ds); after reading in ds the times for reading 
dominate. 

without repr(ds);
Reading ...  done in 0.47 s.
Writing ... done in 140.89 s.
with repr(ds);
Reading ...  done in 141.75 s.
Writing ... done in 2.45 s.

BTW how do I get the revision to install?  Can I use easy_install?  I would 
like to test with the revisions that have been added.

Original comment by reese.ha...@gmail.com on 16 Nov 2012 at 6:48

GoogleCodeExporter commented 8 years ago
That is the behavior that would be expected. 

The slow down results from the conversion of a RawDataElement to a DS 
DataElement. When the dataset is initially read in, the data elements remain as 
RawDataElements until they are accessed. 

They are accessed in three scenarios: 
1) The data element is accessed directly (dataset.PatientsName)
2) The data element value is accessed indirectly (in your case it's repr() 
which requires the value of all data elements)
3) the dataset is written to a file (converts each element from raw then 
writes). 

In your case, you are converting from raw when you call repr(ds), therefore the 
conversion is unnecessary when writing to file. Normally, if you just read in 
the file and then wrote it back to the file, the conversion would happen inside 
the write method.

---------------------------

In order to install a specific revision from the development repository, you 
will need to install mercurial and run the following:

hg clone https://code.google.com/p/pydicom
hg checkout <revision_number>

python setup.py install

NOTE: The repository is really used for development purposes so there may be 
bugs introduced during development so if you are using this in any situation 
where you require stable code, please use the release versions.

Original comment by Suever@gmail.com on 16 Nov 2012 at 7:02

GoogleCodeExporter commented 8 years ago
Is there a way to speed this up besides going back to the previous
release version (I have made all the Beams to BeamSequence etc changes
which I prefer).  I have a loop that is gathering data for plotting
that has really hit a speed bump.  I have already added the cdecimal
replacement.  Anything else I could try since this is "mission
critical"?

Original comment by reese.ha...@gmail.com on 16 Nov 2012 at 7:15

GoogleCodeExporter commented 8 years ago
>> Is there a way to speed this up besides going back to the previous release 
version 

I've just posted a release candidate for a pydicom 0.9.8 release which includes 
the faster DS performance (using float values by default, not Decimal). It is 
not extensively tested yet beyond the usual unit tests and example scripts, but 
as far as I can tell, it is stable. You could try installing that. Easy_install 
or pip should pick the new version up.

Original comment by darcymason@gmail.com on 19 Nov 2012 at 5:09

GoogleCodeExporter commented 8 years ago
Thank you.  I have tried the release candidate and everything appears
to be working smoothly...and speed is increase by about a factor of 10
compared to the 0.9.7
version.  No issues to report so far.

Reese

Original comment by reese.ha...@gmail.com on 20 Nov 2012 at 3:33