GEOUNED-org / GEOUNED

A tool to convert CAD to CSG & CSG to CAD for Monte Carlo transport codes
European Union Public License 1.2
45 stars 27 forks source link

Python conventions #34

Open repositony opened 3 months ago

repositony commented 3 months ago

Would it be possible to take some time and tidy up the code to conform closer to the PEP-8 standards? (seems to be styled as a mix of c++/fortran)

This would help contributers understand the code more quickly and not have to think about all the inconsistencies between files and modules.

I can do this in a PR if you need. VSCode will make this much less of a manual chore so it should not take long, but I'm not yet familiar with how everything is supposed to work together and may break something. Either way, here are a few examples of things we could do.

Modules

The convention is short, all lowercase names for modules (files) with underscores if needed. This is currently very inconsistent, particularly in the Utils directory.

| Utils
  |__ __init__.py
  |__ BasicFunctions_part1.py
  |__ BasicFunctions_part2.py
  |__ booleanFunction.py
  |__ BooleanSolids.py
  |__ Functions.py
  |__ Geometry_GU.py
  |__ QForm.py

These should be regrouped and renamed to make it obvious what they do, but purely for consistency at least something like this:

| utils
  |__ __init__.py
  |__ basic_functions_part1.py
  |__ basic_functions_part2.py
  |__ bool_function.py
  |__ bool_solids.py
  |__ functions.py
  |__ geometry_gu.py
  |__ qform.py

I think that splitting code into basic_functions_part1.py and basic_functions_part2.py makes it difficult to understand what is going on, and becomes more confusing when you also have functions.py. Again on the naming, functions.py is perhaps a poor name for a file containing more classes than functions.

Classes and functions

The classes are mostly PascalCase which is good but sometimes this is also inconsistent. For example:

Functions vary a lot, sometimes being camelCase, sometimes PascalCase, and rarely with underscores. The convention is all lowercase with underscores. For example, within the same LoadFile>LoadFunctions file we have:

# These functions should be renamed
def GetLabel():
def getCommentTree():
def set_docOptions():

# something like this 
def get_label():
def get_comment_tree():
def set_doc_options():

Same goes for variable names

# these variables
def checkEnclosure(FreeCAD_doc, EnclosureList):
def nextIndex(docList, lastIndex=None):

# should be something like this
def check_enclosure(freecad_doc, enclosure_list):
def next_index(doc_list, last_index=None):

Even better would be to have the type hints for better documentation and helpful hints, but not too important.

# with hints
def some_function(any_string: str) -> int:
    return len(any_string)

Variables and regex

I would put all precompiled regex into the same module, but the Write>StringFunctions.py file is a good example

# module variables
celmat = re.compile(r"(?P<cel>^ *\d+) +(?P<scnd>(\d+|like))", re.I)  
grlnumber = re.compile(r"[-+]?(\d+\.\d+|\.\d+|\d+\.?)(e[+-]\d+)?", re.I)
param = re.compile(r"((^|\n {5})[\(\):\-\+\d+\.\# ]*)([\*a-z])", re.I)  
likebut = re.compile(r"but", re.I) 
trans = re.compile(r"trcl|fill= *\d+[ c\$\n]*\(,", re.I) 

# should be explicit and snake_case
cel_material = re.compile(r"(?P<cel>^ *\d+) +(?P<scnd>(\d+|like))", re.I)  
general_number = re.compile(r"[-+]?(\d+\.\d+|\.\d+|\d+\.?)(e[+-]\d+)?", re.I)
param = re.compile(r"((^|\n {5})[\(\):\-\+\d+\.\# ]*)([\*a-z])", re.I)  
like_but = re.compile(r"but", re.I) 
transform = re.compile(r"trcl|fill= *\d+[ c\$\n]*\(,", re.I) 

For constants these should be SCREAM_CASE/CONSTANT_CASE, for example in GEOReverse/Modules/Parser/parser.py the version is a constant, and should be treated as such unless it changes somewhere.

# should capitalise this
version = "3.6"
VERSION = "3.6"

As a side note, this could probably be a value and should not be duplicated in PartialFormatter.py because it would be very easy to forget to change one of them. Also hardcoding a python version like this in the source code that is different to the 3.8 minimum seen in the pyproject.toml may also lead to confusion.

Enumeration

It could also be worth looking into enumeration for things with a few distinct variants. For example:

 def compOperator(self):
       if self.operator == 'AND': 
          return 'OR'
       else:
          return 'AND'

What if self.operator was set to lowercase? what if it is anything but "AND"? You end up with "AND" no matter what. Especially problematic as this is initialised to None by default.

class Operator(Enum):
    AND = auto()
    OR = auto()

    @property
    def complement(self):
       if self == Operator.AND: 
          return Operator.OR
      elif self == Operator.OR:
         return Operator.AND
      else:
        raise Exception("The operator {self} was not recognised") 

# so that whenever you need the complement
 def compOperator(self):
    return self.operator.complement

Note great code I've written as there are better ways of handling this, but just an example.

Final Note

I think just a few changes to the style of the source code would help new contributers a lot. The fork I have on some internal UKAEA servers started to do a lot of this to make it easier to follow while we try to understand the code and add new things. Of course, any bugs or useful changes will be fed back to your master branch here.

However, I understand that this is a lot to think about and the original authors will be used to the current structures and code formatting. Feel free to close/delete the issue if this is not the direction you want to take GEOUNED.

repositony commented 3 months ago

Also while I think of it, I'm not sure why the main __init__ accesses the underlying class dictionary directly, it should just be something like this (again with the snake_case members):

class Geouned:
    def __init__(self, title="Geouned conversion"):
        self.step_file = ""
        self.geometry_name = ""
        self.material_file = ""
        self.output_format = ("mcnp",)
        self.title = title
        self.void_gen = True
        self.debug = False
        self.comp_solids = True
        self.volume_sdef = False
        self.volume_card = True
        self.dummy_material = False
        self.universe_card = None
        self.simplify = "No"  # Why is this not a boolean True/False?
        self.cell_range = []
        self.export_solids = ""
        self.min_void_size = 200  # units mm
        self.max_surfaces = 50
        self.max_bracket = 30
        self.void_material = []
        self.void_exclude = []
        self.start_cell = 1
        self.start_surface = 1
        self.cell_comment_file = False
        self.cell_summary_file = True
        self.sort_enclosure = False
repositony commented 3 months ago

General structure is good, the src and pyproject.toml is the modern way of setting up packages for avoiding import issues and being more agnostic about the builder.

Just another thought, but I think the code should have two main interfaces:

Therefore the src directory could also probably look something like this to separate them out a bit:

GEOUNED/
|- docs/
|- tests/
|- src/
|   |- scripts/                     <--- Command line scripts
|       |- geouned.py
|       |- georeverse.py
|   |- geouned/                     <--- The Geouned package
|       |- conversion/
|           |- __init__.py
|           |- cell_definition.py
|           |- etc...
|       |- etc ...
|   |- georeverse/                  <--- The GeoReverse package
|       |- etc ...
|- LICENSE
|- README.md
|- pyproject.toml
alberto743 commented 2 months ago

Thank you for this PR. I may help as well, if needed.

shimwell commented 2 months ago

I've read through the code today. I still agree with everything in this thread.

I think we might need to go in small steps to avoid making massive PRs and keeping things easy to review and consider one change at a time.

For example if we change the arg names to snake_case and the class name to camel case and move to a standard class init layout all in a single PR then that will have large changes throughout the code base.

repositony commented 2 months ago

to avoid making massive PRs

Completely agree, one of the reasons I've not done anything with this yet.

Do you have suggestions for how we go about a bit of a refactor/tidy? Probably best to keep everything in one place, so perhaps a pep8 or refactor remote branch would be appropriate so that we can all help without interfering with the main branches. Though we should see how @psauvan and @jpcatalanUNED want to handle things first.

Also just throwing it out there, but I've also been really liking dataclasses recently as a clean way of defining things and avoiding some boilerplate. Might be worth considering if the 3.8 minimum version in the pyproject.toml is correct because I think they are ok for anything >=3.6

shimwell commented 2 months ago

keen to volunteer some minimal CI to safeguard things a bit