Closed unrealsolver closed 6 years ago
BTW, if someone knows where is importer code located, please point me, I am not familiar with code base yet.
I tried use numpy-stl
to load the file, it works with no exceptions (will try to force log all warnings if possible). And I think it will be great idea, to remove all these pcross
and pnorm
and replace with simple and fast numpy. Which, actually, might be tricky in terms of whole app. However, we know 'adapter' design pattern, so could do it incrementally.
Before going to all this work run it through something that looks at the direction of the surface triangles. I have seen this warning when one of the triangles is "wound" the wrong way and are interpreted as facing inside the object. If that is happening after decimation, then there is a bug in the decimation software (or the triangulation was bad in the first place). If the original was good and the decimation is bad, then that would be an excellent part to send in a bug report.
Hope this helps,
EBo --
On Dec 31 2017 11:25 AM, Ruslan wrote:
I tried use
numpy-stl
to load the file, it works with no exceptions (will try to force log all warnings if possible). And I think it will be great idea, to remove all thesepcross
andpnorm
and replace with simple and fast numpy. Which, actually, might be tricky in terms of whole app. However, we know 'adapter' design pattern, so could do it incrementally.
@ebo , thanks. As I said, the model (attached to OP) could be opened by all other software without any exception. Secondary, the model does not have such invalid triangles, like (0, 0, 0) / (0, 0, 0) / (0, 0, 0) -- I checked via simple numpy script. The binary stl format is pretty straightforward, I'll try to look at the pcam code. And I also validate normals direction. But, I still think it's bug in pycam, because it also sometime fails to load Design Spark Mechanical generated stls.
@Ruslan, You are welcome. Sorry I already deleted the OP. I've worked
with STL mostly in OFF/COFF format. Looking up code to check the
normals, I was reminded of the work testing for "water tight" objects.
Here are some references:
http://www.meccanismocomplesso.org/en/check-stl-files-3d-printing/ https://3dfizzr.wordpress.com/2013/08/05/how-to-check-you-stl-files-before-3d-printing-them/ https://github.com/revarbat/stl_normalize
and a whole list: https://github.com/revarbat/stl_normalize
Unfortunately I have not had to do much of this for decades (not really since I was taking graduate classes in CAGD with Farin).
On Dec 31 2017 12:31 PM, Ruslan wrote:
@ebo , thanks. As I said, the model (attached to OP) could be opened by all other software without any exception. Secondary, the model does not have such invalid triangles, like (0, 0, 0) / (0, 0, 0) / (0, 0, 0) -- I checked via simple numpy script. The binary stl format is pretty straightforward, I'll try to look at the pcam code. And I also validate normals direction. But, I still think it's bug in pycam, because it also sometime fails to load Design Spark Mechanical generated stls.
@ebo, I'm not sure I know meaning of "water tight" idiom, but I know meaning of code =) And pycam has dedicated exception for inconsistent normals:
https://github.com/SebKuzminsky/pycam/blob/master/pycam/Importers/STLImporter.py#L159
Thanks for the links, I already read both, while trying to understand the issue. And, apparently, I found the bug. Pycam ignores offset: 84 bytes: 80 for header and 4 for triangles count. It never do file.seek(84)
and start read file from 0
byte, which is totally incorrect. I have no idea how it worked before, lol.
if is_binary:
for i in range(1, facet_count + 1):
log.warn(f.tell())
https://github.com/SebKuzminsky/pycam/blob/master/pycam/Importers/STLImporter.py#L118
prints 0
, which, obviously, should be 84
.
The fix is really simple: f.seek(84)
.
I'll send the PR, when I done unit tests for it.
I guess, that aa258c7bfda6dc0c3a00f2a1bf0fbdfe1b3d7d72 is the culprit (a quite recent "cleanup" change).
Please check if a simple seek would really fix the issue. Thank you!
@Ruslan, for completeness, as far as I know the term "water tight" came from meshes that had small holes in them, and the object prints would not 'hold water'. This also came about when you would process facets which "faced you", which would also generate holes in the end result.
Hmmm... besides it possibly being part of a recent change, there is also the possibility that the header skipped one entire facet. Who knows, but I am glad that you found it ;-)
On Dec 31 2017 1:02 PM, Ruslan wrote:
@ebo, I'm not sure I know meaning of "water tight" idiom, but I know meaning of code =) And pycam has dedicated exception for inconsistent normals:
https://github.com/SebKuzminsky/pycam/blob/master/pycam/Importers/STLImporter.py#L159 Thanks for the links, I already read both, while trying to understand the issue. And, apparently, I found the bug. Pycam ignores offset: 84 bytes: 80 for header and 4 for triangles count. It never do
file.seek(84)
and start read file from0
byte, which is totally incorrect. I have no idea how it worked before, lol.if is_binary: for i in range(1, facet_count + 1): log.warn(f.tell())
https://github.com/SebKuzminsky/pycam/blob/master/pycam/Importers/STLImporter.py#L118 prints
0
, which, obviously, should be84
. The fix is really simple:f.seek(84)
. I'll send the PR, when I done unit tests for it.
@unrealsolver:
And I think it will be great idea, to remove all these pcross and pnorm and replace with simple and fast numpy.
I am not familiar with numpy. Thus I would follow your judgement, as long as there is an improvement (better performance or reduced code complexity) and the code readability does not suffer too much. Thank you!
On Dec 31 2017 8:05 PM, sumpfralle wrote:
@unrealsolver:
And I think it will be great idea, to remove all these pcross and pnorm and replace with simple and fast numpy.
I am not familiar with numpy. Thus I would follow your judgement, as long as there is an improvement (better performance or reduced code complexity) and the code readability does not suffer too much. Thank you!
I would also vote for using numpy replacements wherever possible.
Numpy is a rock solid stable dependency with thousands of people using
it each day. These routines are vetted and maintained, and would be a
good replacement. Also, if I am not mistaken, numpy provides python
bindings to low level C implementations. This will typically speed up
pure python equivalents a factor of 5 to 10.
All that said, you have to be careful with how you use numpy:
Do we currently keep any timing information as part of the regression testing?
EBo --
@ebo, as I can see, there are just few unit tests, and I didn't noticed any performance metrics there. Yes, the main idea of moving towards numpy is because its written in C, however, as mentioned in your links, numpy requires additional step to convert python list
to numpy ndarray
, which, actually, should not be a problem.
@sumpfralle, is requirements.txt
is up to date? Do I have to create virtualenv for the project?
@Ruslan, maybe we should discuss timing tests. They will take a bit to implement and track, but I have seen them save a project with the timing changed due to a propagated bug. Do you have anything you use to display and track this type of data? But the real question is how much time does the overall processing take, and is it worth optimizing the code?
Also, when using numpy I typically set things up to do as little mixing as possible and stick with ndarrays and floats
On Jan 1 2018 8:07 AM, Ruslan wrote:
@ebo, as I can see, there are just few unit tests, and I didn't noticed any performance metrics there. Yes, the main idea of moving towards numpy is because its written in C, however, as mentioned in your links, numpy requires additional step to convert python
list
to numpyndarray
, which, actually, should not be a problem.
@ebo, performance tests will require some infrastructure. I'm not familiar with Travis CI, it will be great, if it could do perf charts. Also Evergreen CI (https://evergreen.mongodb.com/) is good for running tests against different envs, performance tests, but it require some infrastructure to install on. Graphite, or some similar solution will require too much effort to set up, I think. PS. If you really want to keep performance under control, you could just check if desired function took less than, let's say, 10ms time - but this is unstable. I'll think what infrastructure I have and what software could be used to control performance.
On Jan 1 2018 8:49 AM, Ruslan wrote:
@ebo, performance tests will require some infrastructure. I'm not familiar with Travis CI, it will be great, if it could do perf charts. Also Evergreen CI is good for running tests against different envs, performance tests, but it require some infrastructure to install on. Graphite, or some similar solution will require too much effort to set up, I think.
I believe that Travis CI can do timing tests, but I am not experienced with Travis CI (we could not configure it to work at NASA due to agency restrictions and guidelines).
My intern and I developed a new CI/CD tool last summer that meets NASA agency guidelines and restrictions (things like all code and data must be kept behind a firewall until formally released, and only test results are pitched over the wall). It is extremely light weight and provides basic unit testing, reporting, and dashboarding. We are in the middle of having it reviewed for open-source release, but that process will take as long as it takes. Developing timing tests is something that is on our drawing board. If people are willing try a new tool (once it is released), then let me know and I will attempt to port the unit tests, and see if I can do a decent job at the timing tests.
is requirements.txt is up to date? Do I have to create virtualenv for the project?
The requirements.txt was sadly out of date. I just fixed this. You do not need a virtualenv.
Regarding the tests: I would recommend to open a different issue for this topic.
Personally I have the feeling that for now manual runs with timeit
or similar are sufficient for comparisons between different development steps and branches. But if someone wants to work on timing metrics - please go ahead.
(I will be able to provide hosting and administration for this kind of purposes - if there should be the need for self-hosted services)
Regarding numpy: I really did not want to encourage time-consuming measurements or complicated analyses. If you think, a switch to numpy would help the code in some way without hurting too much: I am open and interested. Please go ahead :)
@sumpfralle , Firstly, I would like to work on simple issues. Then, probably, I'll try to do something big. I have no idea what exact performance boost numpy will give us, however I would like admit following:
I am not a numpy expert either, but I believe when you start doing lots of linear algebra in python, it's better to use numpy (For machine learning, for example)
@sumpfralle, there are simple tests that can be done, and way over the
top tests that could be done. I was advocating starting with the
smallest possible and adding what makes sense and/or is necessary. I do
advocate the philosophy that when you find a bug, write a test to expose
it and verifies the fix, and make that test a part of the test harness.
I will have to look at various profilers and see if I can extract what I
need, but my thought is to create a CSV file with the timing test
appended at the end of each commit (or we can choose
version/revision/release, etc.). The trick is how to display over time.
re: numpy...
I agree on all counts. I do use it on a daily basis, but note that my sue cases are typically shallow. At the moment I am using Dask (which allows for distributed and paralle computing, as well as persistence, delayed out-of-core processing, and other things I am just learning. I think Dask is probably to much to consider for pycam, but Pandas might just be the ticket for some things like tracking timing changes. I will try to give it a think.
On Jan 2 2018 9:37 AM, sumpfralle wrote:
is requirements.txt is up to date? Do I have to create virtualenv for the project?
The requirements.txt was sadly out of date. I just fixed this. You do not need a virtualenv.
Regarding the tests: I would recommend to open a different issue for this topic. Personally I have the feeling that for now manual runs with
timeit
or similar are sufficient for comparisons between different development steps and branches. But if someone wants to work on timing metrics - please go ahead. (I will be able to provide hosting and administration for this kind of purposes - if there should be the need for self-hosted services)Regarding numpy: I really did not want to encourage time-consuming measurements or complicated analyses. If you think, a switch to numpy would help the code in some way without hurting too much: I am open and interested. Please go ahead :)
@unrealsolver: agreed. In fact I just wanted to say: there is no need to convince me :)
Profiling: I think the light-weight approach of simply storing a CSV file with commit ID vs. runtime of one or more specific tasks should be sufficient. This should be easily possible retro-actively in a script. Personally I would not need more than a simple gnuplot visualization for this CSV data. Storing data permanently in alongside commits may be problematic, since performance surely depends on the hardware in use and the version of the python interpreter.
Whoever starts something in this direction will decide what happens :)
I have created ascii stl file with 4k triangles, than I simplified the file using MS 3D builder (default ms program to work with stl). After simplification it appear to have 1,6k triangles and the format was changed to binary stl. The file works well with all stl readers I tried (blender, MS, some on-line tools). The file: https://drive.google.com/file/d/1uP5-VkIHeNG01WAwTeQy7rKIU-SXRR0q/view?usp=sharing Stasck trace: