Closed davidhalter closed 4 years ago
What format does the diff use?
That is a good question. What would you suggest? Should we make that apparent in the command name as well? (like git_diff
or whatever)
It depends on how the workflow should be. I thought that a refactoring would simply reorder the AST and then rebuild the entire source. But for some clients it could indeed be simpler to apply a diff.
If you choose a diff format, I'd go with a unified diff (diff -u
), it can be applied using patch
.
Another option would simply be to send line numbers and the corresponding content for all lines that changed...
for integrating with let's say editors, probably the best would be diff -u
format. It's also pretty much "standard" format.
@davidhalter OK, I've started to look at the code a little and while I haven't understood how your code is working right now I can say that generally my thoughts on API are this.
First let's say what is Refactoring? I think about refactoring as function of type (using haskell-like notation)
Refactoring :: FilePointer -> Parser -> [Change]
so it takes file pointer, parser and produces change
where FilePointer (should be a better name) - is a pointer to an element of a file that we want to refactor. Refactoring should then use Parser to convert this pointer to some language element, like a function, class of variable etc and after some work that depends on actual refactoring return Changes.
Change should contain the state of file before change, after and the beetween them. Or maybe it would be better to have a function that can generate diff beetween two states.
I think this api is pretty simple and all we can discuss here is what Change should contain. Do we only want the diff or also the state before and after the refactoring.
Also it is probably important how we represent a file pointer. I mean should we load the file in memory as a string so that nobody could temper with it while we working on it or just use something like with open(file)... etc.
So what do you think about that?
@purpleP
so it takes file pointer, parser and produces change
Well, IMO Jedi's refactoring shouldn't produce the change. The IDE itself should be doing the change, because I think file management is the responsibility of the IDE.
Do you agree? I think your whole post was about file management, which I don't intend to do.
@davidhalter No, of course Jedi doesn't need to change file. I think it should provide information in the form of Change object about what files should be changed to perform refactoring and how exactly they should be changed.
So what's your proposal? What do you think should be different?
Well, as I see now the approach you have now is very similar to what I've been proposing. Other than I call Refactoring a function that does refactoring and not the result that is returned. While I see you reasoning for that naming probably "function return refactoring to be performed by editor" I think slightly different about that "jedi performs refactoring and returns changes to be applied by editor". The main question to me right now (I've looked at the code of parser a little, but haven't understood api yet) is how to use parser. Maybe you could provide or point to some examples on that?
@davidhalter I see now that parser isn't used in refactorings so far, so I can't find any examples yet. Also I can't understand how tests are working.
# --- simple
def test1():
#? 7 blabla
test1()
AssertionError
return test1, test1.not_existing
# +++
def blabla():
blabla()
AssertionError
return blabla, blabla.not_existing
This and lines in extract.py inline.py and rename.py look like some mumbo jumbo. PyCharm even shows syntax errors on that files. Also what is not clear to me is how rename can be done without parser, because you need to understand what should be renamed and what isn't
@davidhalter also I don't see requirements.txt or any other form of dependencies management (not listed in setup.py).
@davidhalter Ok, I understand now that Script (not a good name imho) have parser for some reason. and redirects calls to user_stmt to parser. But the parser doesn't return most specific statement for position, it returns the whole expression for me in case
I call Refactoring a function that does refactoring and not the result that is returned.
Well, we could still implement that as well. I think that would even be a good idea, but for now I don't think it's that important. Once we need it, we just add a method Refactoring.refactor
.
@davidhalter also I don't see requirements.txt or any other form of dependencies management (not listed in setup.py).
Jedi doesn't have any dependencies.
This and lines in extract.py inline.py and rename.py look like some mumbo jumbo.
True, it looks a bit weird, however the tests are actually quite straightforward. If it's syntactically incorrect, that is something that should change, of course. The # --- NAME
and # +++
are delimiters to separate tests.
---
starts a test and the code after +++
is the result of the test. #? 25 x
is the line where the test starts (comment line + 1, indent 25). x would then be the rename.
However feel free to create your own test suite. If you like it better to test with py.test directly, I'm very happy with that as well. I don't like these tests as much anymore as I did when I originally created them.
The main question to me right now (I've looked at the code of parser a little, but haven't understood api yet) is how to use parser. Maybe you could provide or point to some examples on that?
Well, don't get too confused about the user_parser module. It's something that you only need if you work with completions and that kind of user interaction (not refactoring, though).
First I would just run the necessary calls (like jedi.usages for rename) and then you just work with that. Try to inspect Definition._definition a bit. You will see that you have access to the name positions (and many times also the parser)
You can also get the current module directly by using module = Script._parser.module()
. (Don't care about protected names, yet). You can e.g. call module.name_for_position
than you modify the Name.value
or Name.prefix
and once you have done that with all of your refactoring you call module.get_code()
to retrieve the changed file.
@davidhalter isn't pytest a dependency?
pytest is a dev dependency and by that the only one (maybe tox as well). I would not want everyone to install it.
BTW: I updated my comment above, please read :)
@davidhalter I agree with that pytest is a dev dependency and I don't like dev dependencies to be installed with setup.py install, but it's actually the way to do it in python. So people could run tests with setup.py test. I see you use travis ci to run tests, right? Haven't used it yet. Can people that install this library run tests somehow before installing?
Anyway, I finally came up with testing scheme. You can see it in latest commit in my fork. So what I do is for every refactoring I will create two folders in tests folder for fixtures. One that is state of files before refactoring and one that is reflects state that should be after refactoring. This folders can have subfolders etc. Every file in this folders should have a unique name so that file from input folder could be matched to zero or one file in the output folder. Then I'm just generating change dict from pairs (input_file, output_file).
Right now this approach can't deal with renaming of files during refactoring, but that's fairly easy. I think that convention that if file should be renamed after refactoring then in output folder we should create a file with name that have name of the input file as a substring
for for example input file is 'foo.py' and after refactoring it should be renamed. We create a file 'new_foo.py' and in tests pass a new name of the file as 'new_foo'.
I hope that this is clear enough.
So after I've created first test with this scheme rename refactoring is failing when I'm renaming function that's used in multiple files. I will fix this some time later.
So after I've created first test with this scheme rename refactoring is failing when I'm renaming function that's used in multiple files. I will fix this some time later.
Nice. It might also be an issue with Jedi's usages itself. That method is definitely not bug free.
Are you developing on master branch? If you do, please merge with linter first (or at least dev, but better linter), otherwise we will have very nasty merges.
I agree with that pytest is a dev dependency and I don't like dev dependencies to be installed with setup.py install, but it's actually the way to do it in python. So people could run tests with setup.py test.
You're right, sorry. I learned something again - thanks!
I see you use travis ci to run tests, right? Haven't used it yet. Can people that install this library run tests somehow before installing?
Travis-ci is only for continous integration. It's not for local testing. You can see the results online when you push. If you want to test multiple versions locally, use tox. It's awesome.
@davidhalter Oh, I've somehow haven't seen that you use master, dev etc flow (like git-flow) and I've branched from master. I'll merge with dev because it's git-flow's way AFAIK. You can see my last commit. I've rewritten rename refactoring and it passes my test now. Previous version somehow haven't changed all files as I've already said. So what is not clear for me now is the api. I don't know how jedi integrates with editors, so I don't really understand what should be returned from refactoring function so that editor could use it to change the source code.
Right now I can see that the 'real' api is change dictionary (I don't understand why we need Refactoring class and new_files
, old_files
methods) that contains for every file it's path after refactoring and content before and after refactoring. Can you explain how they used?
I can't remember refactoring for which this wouldn't be enough. I mean I can't remember refactoring that for example removes a file (and even then we could represent the deletion with None instead of new path). So let's think together is this format is enough?
If it's enough then I have a question why dictionary instead of tuple of tuples. Is indexing by key is used somewhere?
Sorry if it's annoying that I questioning everything, But I never take anything for granted and always trying to understand design choices and think whether they can be improved or not etc. That's just how my brain works I guess.
Right now I can see that the 'real' api is change dictionary (I don't understand why we need Refactoring class and new_files, old_files methods) that contains for every file it's path after refactoring and content before and after refactoring. Can you explain how they used?
If it's enough then I have a question why dictionary instead of tuple of tuples. Is indexing by key is used somewhere?
The questions are very fine. In the first post you can see a sketch of the API. I intentionally left the intialization black, because it does not matter how its implemented. The logic you can see in Jedi right now is very old and I was confused when I looked at it myself. It was probably written the way it was written, because the old refactoring code was very complicated (it had to, because of the old parser). Just change the initialization and the storage to whatever you want - I'm very fine with that.
@davidhalter I don't understand why returning tuple (file_name, (new_file_name, old_content, new_content)) isn't sufficient. Also in your extract function
script._parser.user_stmt()
If I'm trying to go to definition of user_stmt it gives me 'Can't find any definitions for this'
While if I'm trying to do the same thing in PyCharm it gives me list of possible candidates. And AFAIK it suggest valid candidates.
You once said that your parser is better than in PyCharm and if someone find occasions when PyCharm parser is better they should say so, so this is what I'm doing.
Anyway what 'user_stmt' function is doing?
@purpleP
I don't understand why returning tuple (file_name, (new_file_name, old_content, new_content)) isn't sufficient.
I don't understand what you're asking me.
script._parser.user_stmt()
This works for me. Cannot reproduce. If you think this is a serious issue post it with a direct Jedi call as a new issue, then I will investigate again.
@davidhalter I'm asking what is the purpose of Refactoring class. Why not just return list of tuples?
Because it's a horrible API :) Seriously, there's a very good reason no well known 3rd party API in Python just returns complicated tuples: Because nobody understands that. Classes are very well understood and can be documented.
@davidhalter I agree with what you say. But still don't understand why we need object in this case. We can define for example
class FileState(object):
def __init__(self, path, lines):
self.path = path
self.lines = lines
class FileChange(object):
def __init__(self, initial_state, resulting_state):
self.initial_state = initial_state
self.resulting_state = resulting_state
and return list of FileChange objects
Because the object allows for more flexibility. It allows for diffs for example.
@davidhalter I think it just clutters the api. I think API will probably be used by going though the list of changes and applying them. So if we return the list of changes there would be no need to type something like rename(s, pos).changes
it would be just rename(s, pos)
You're talking about diffs, but what diff should return in case api will return the object that is incapsulating the list of changes? Diffs for every file?
I don't understand why not just use something like that
def possible_usage_of_refactoring(s, pos):
[apply_change(ch) for ch in rename(s, pos)]
def apply_change(ch):
d = diff(ch) # or ch.diff
do_something_specific_to_editor_with_diff(d)
If you don't agree with that please explain what editor would need exactly to apply refactoring? Just new file name and diff? What is diff actually? Or new name and new content? I don't understand that.
Also I've opened the issue about goto_definitions, I would appriciate if you would look into it
You're talking about diffs, but what diff should return in case api will return the object that is incapsulating the list of changes? Diffs for every file?
Not diffs for every file. One diff. This obviously includes all the different files. See it's not just about applying a refactoring. There are multiple possible tasks. Someone maybe wants to display the differences first in a diff -u
format, someone wants a different format. Someone wants to directly apply the refactoring. There's a lot of different possible use cases. To allow all of them a class is the way to go.
Also I've opened the issue about goto_definitions, I would appriciate if you would look into it
I've looked into it quickly and I cannot seem to find the issue.
@davidhalter And what this diff for all files should return?
The diff.
@davidhalter Can you give and example, maybe? Because 'Diff should return the diff' isn't explaining anything. At least for me.
I'm trying to work on extract refactoring now. Is there an api to get the 'element' (I don't know how to name that properly) for position? I see that you've used get_statement_for_position, but it return the whole statement, not the current element (which we want to extract, like a function call, or variable name etc).
http://unix.stackexchange.com/questions/73771/what-does-u-of-diff-really-do
This is a diff (diff -u
):
$ diff -U 2 file1.txt file2.txt
--- file1.txt 2013-04-26 09:39:13.496835363 -0400
+++ file2.txt 2013-04-26 09:38:04.838299195 -0400
@@ -1,6 +1,7 @@
The Rain in Spain by
+added extra line here
Servants Poor Professor Higgins!
Poor Professor Higgins! Night and day
He slaves away! Oh, poor Professor Higgins!
-All day long On his feet; Up and down until he's numb;
+All day long On his feat; Up and down untile he's numb;
Doesn't rest; Doesn't eat;
I'm trying to work on extract refactoring now. Is there an api to get the 'element' (I don't know how to name that properly) for position?
No not really, I guess it's easiest to just create it for that use case.
@davidhalter Yeah, I get what diff returns. But my question was what Refactoring.diff should return? A list of tuples (filename, diff)? If so, than I don't see benefits in that. I don't see why we need to return something that would suite everybody's needs. I think we should return something that have all info needed to perform edits for refactoring. And everybody can do what they need with this info. For what I know right now Refactoring class is redundant entity.
Yeah, I get what diff returns. But my question was what Refactoring.diff should return? A list of tuples (filename, diff)? If so, than I don't see benefits in that.
Of course not. It will return the diff for all the files. It returns a STRING. Of course we don't return a tuple. Because when we return a tuple we have the same old issues again that we don't want.
I have tried enough now. I know you think differently. I will not discuss this any further. I have made enough experiences to know that this class will be needed (and also that tuple returns are a very bad idea). We have different opinions - that's ok, but I have to maintain the software for ages, so it's also going to be my problem.
@davidhalter it's not about 'opinions'. I just don't understand what you really want. You want Refactoing.diff to return a string, OK. What the content of this string should be? What I'm trying to get from you is exactly what information refactoring function should provide? Not the format of this information, but what information exactly?
@davidhalter Oh, and I've forgot to ask what version of python do jedi use?
@davidhalter Ok, I've looked closely to your Refactoring class and I understand what diff method is doing, but I have two questions. if old_path is None what unified_diff would return? And why concatenate diffs in one string? Is this format used by some editor? Just curious.
Python 2.6/2.7 and Python 3.2+ (3.5 will be working once someone tackles the issue).
Don't look at my Refactoring code there. As I've mentioned before
And why concatenate diffs in one string? Is this format used by some editor?
It's not used by editors, it's the way how git patches work and how a lot of humans like to read "changes". e.g. git also will give you a diff of multiple files. That's very intentional.
@davidhalter OK, That's what I wanted to know! I've made some changes to the inner api. What do you think about them? I've merged with dev, so just see refactoring.py. I've renamed Refactoring to Changes and refactored it a bit, but the api is the same. I haven't done regression test, but it should work. I think that FileState probably should be merged with Content class by adding path to Content maybe, but still have doubts about it. Also function change_for_refactoring
have been added to incapsulate that for all refactoring we should generate changes and then pass them to the Changes constructor.
What do you think about them?
Could you create a PR? It's way easier to look at the changes there. Because of diffs ;-)
@davidhalter I can, but only renaming works now. I'm working on extract right now (the old version doesn't work) and I haven't tested inline yet.
It's ok, just put a WIP in the title. Pull requests are supposed to be work in progress.
@davidhalter I've opened another pull request to merge with dev as you've asked. But I have another set of questions.
So I need to somehow refactor get_statement_for_position
method of the BaseNode
class.
First thing I've noticed is that you just doing dfs and checking some condition and returning first match.
Node
that involves children could be function on graph.This is probably a bad idea I think, but still it would interesting to hear your opinion on that.
extract
refactoring is the same search just with different condition. This leads to adding this functions.def find(iterable, predicate):
return next((x for x in iterable if predicate(x)), None)
def find_node(start_node, predicate):
return find(traverse(start_node), predicate)
def traverse(start_node):
return chain([start_node], *map(traverse, iter(start_node.children)))
I've kind of mocked node in this test, so this should work on you tree, because interface node.children
is the same.
class MockNode(object):
def __init__(self, _id, children=()):
self.children = list(children)
self._id = _id
@fixture
def tree():
level2_0 = MockNode(_id='2_0')
level2_1 = MockNode(_id='2_1')
level1_0 = MockNode(_id='1_0', children=(level2_0,))
level1_1 = MockNode(_id='1_1', children=(level2_1,))
return MockNode(_id='0', children=(level1_0, level1_1))
def test_traversal(tree):
ids = [n._id for n in traverse(tree)]
assert ids == ['0', '1_0', '2_0', '1_1', '2_1']
def test_find_node(tree):
assert find_node(tree, lambda n: len(n.children) == 1)._id == '1_0'
assert find_node(tree, lambda n: len(n.children) == 3) is None
And what I need now is to refactor get_statement_for_position
method to use this functions and then to add new search function somewhere that would find 'element' for position. How this 'element' properly called by the way?
get_statement_for_position
if c.type not in ('decorated', 'simple_stmt', 'suite') \
and not isinstance(c, (Flow, ClassOrFunc)):
return c
else:
try:
return c.get_statement_for_position(pos)
except AttributeError:
pass # Must be a non-scope
First. I know only one legitimate use for isinstance check in OO code - custom equality checks, but finding node of specific type in AST is probably another one (at least I can't imagine any other way right now, besides Visitor pattern of course, but it would just be more code I think without any benefits), but why nodes have both type
field if they have python type?
Second. What condition should I use to find 'element' for position? This question probably follows from me not understanding what this 'element' is named properly.
PS Oh, and I forgot to ask. I haven't found test for get_statement_for_position
. Could you provide at least a template for such test. It should probably involve constructing AST either by parsing some file or manually I suppose.
PSS Also I'm not sure what does attribute error doing there and how should I refactor it with my find_node functions. I mean I can define function that would check for existence of method in object, but that's just seems wrong. It should be enough to filter by type really. Maybe this code lack some base class or something, I don't know.
@davidhalter I've started to look at your AST parser and while I don't understand why the code is doing what it's doing yet I think it would be beneficial to refactor it a little.
def parents(node):
parent = node.parent
if not parent:
return []
else:
return chain([parent], *parents(parent))
def parent_until(node, classes=(), including_start=True):
ps = list(parents(node))
if not including_start:
ps = ps[1:]
predicate = partial(isinstance, cls=classes)
return find(parents, predicate)
def find_root(node):
return list(parents(node))[-1]
def parent_until_or_root(node, classes=(), including_start=True):
ps = list(parents(node))
if not including_start:
ps = ps[1:]
# is this function really used in this way?
# Why not just have two functions?
if not classes:
return ps[-1]
else:
predicate = partial(isinstance, cls=classes)
return find(parents, predicate)
def defitinitions(node):
def predicate(node, parent):
return (isinstance(node, (Node, Name)) and
node.type == 'testlist_comp' and
parent.type != 'simple_stmt' and
len(node.children) >= 1)
ps = [node] + list(parents(node))
possible_defs = ((scope.children[1] for scope, parent in zip(ps, ps[1:])
if predicate(scope, parent)))
return find(possible_defs, partial(isinstance, cls=CompFor))
def parent_scope(node, include_flows):
is_flow = partial(isinstance, cls=Flow)
def is_scope(node):
return node.is_scope()
if include_flows:
predicate = is_flow
else:
predicate = is_scope
return find(parents(node), predicate)
Basically I've tried to refactor while loops into more functional style, which removed code duplication in parent-finding iteration.
I haven't rewritten the methods themselves because I don't see any test, so I can't really be sure that I've refactored this methods correctly. So I would like if you could provide guidance on what this methods really mean and how to test them.
@purpleP you should probably provide refactorings of unrelated parts of the code as a separate pull request, so that they can be reviewed independently of the refactoring code. Makes it much easier for @davidhalter to review.
@dbrgn Yes, obviously. I just wanted to give a taste what I'm going to do, before I'll do that. The problem is - this code isn't tested AFAIK, and to write the test I will probably have to understand all parsing code.
So I need to somehow refactor get_statement_for_position method of the BaseNode class.
No please don't. I don't think you need to refactor it. What I suggested is that you need a similar function. But this function doesn't need to live in the syntax tree business logic.
My first question is probably stupid, but still. Why not use some graph (or more specific tree) library, that already have many of such functions, dfs, bfs, traversals etc? Your node classes could remain mostly intact, just all methods of Node that involves children could be function on graph.
It's not a graph, it's a tree. Therefore we have something incredibly unspectacular to work with. DFS is something you can write in like 2s. Jedi has very strong performance requirements in terms of memory and performance. A library is therefore probably a bad idea. I think that trees (contrary to graphs) are so easy to work with that you really don't need a library. However it really doesn't matter, since we're stuck with what we're having anyways.
And what I need now is to refactor get_statement_for_position method to use this functions and then to add new search function somewhere that would find 'element' for position. How this 'element' properly called by the way?
This element can be called like 50 different ways. Therefore it will be more important to work correctly with positions. It's probably better to do a different kind of search: Search the leaf under the cursor (careful it might be whitespace, in that case take the next leaf?) and then you look for parents until you are not allowed to anymore (this is limited by a certain set of node names, see https://docs.python.org/3/reference/grammar.html).
I don't think a variation of get_statement_for_position is a good solution.
but why nodes have both type field if they have python type?
Well, because not all nodes have a Python type. Also you seem to come from a very "healthy" world. You talk about legitimate usages and so on. Don't underestimate historical issues. A lot of the things in Jedi are not ideal, because at some point we had to refactor. The refactorings were not always perfect so you have certain issues within the code. The confusion you're talking about is a real one and it's a historic one. I have been wanting to switch to mostly .type
lookups, but the isinstance logic is just still everywhere (and sometimes I actually prefer it).
PS Oh, and I forgot to ask. I haven't found test for get_statement_for_position. Could you provide at least a template for such test. It should probably involve constructing AST either by parsing some file or manually I suppose.
There's like 100 indirect tests. If you mess up the user_stmt some test will start to fail (within Jedi). You don't need to test it separately, I know that's not optimal, but a lot of things aren't.
.PSS Also I'm not sure what does attribute error doing there and how should I refactor it with my find_node functions. I mean I can define function that would check for existence of method in object, but that's just seems wrong. It should be enough to filter by type really. Maybe this code lack some base class or something, I don't know.
I'm pretty sure this AttributeError is just there for checking if it's a Leaf. However you don't have to refactor that function. If you do that's ok. But I will check the complexity of that function and the general time behavior. I don't want these things to get a lot slower.
Basically I've tried to refactor while loops into more functional style, which removed code duplication in parent-finding iteration.
Please don't. I don't like partial and I'm not willing to merge that. I'm all for simplification but not if it goes that way. If you need new functions define them.
e.g.
def find_root(node): return list(parents(node))[-1]
Is not something I want there. The parser is already slow enough (and the parser tree logic). If you need this function please call it get_module
and use a simple while to traverse.
Also you renamed some of the functions (like get_parent_until). I can understand that, but please don't merge that with your current refactoring code. Like @dbrgn said it will get extremely messy and it's not possible to review it that way.
@davidhalter Hi again. I had some time to work on you tree parsing code.
def parents(node):
parent = node.parent
while parent:
yield parent
parent = parent.parent
def find_root(node):
return last(parents(node))
def last(iterable):
last = next(iterable)
for last in iterable:
pass
return last
Maybe I don't know python good enough, but AFAIK this code shouldn't take more time or space than simple while loop. At the same time I've already managed to decrease the code size in functions that I've rewritten by ~10-25 percents. I've also probably found a bug.
def get_previous(self):
"""
Returns the previous leaf in the parser tree.
"""
node = self
while True:
c = node.parent.children
# if we are in the first leaf we will have blabla not in list exeption instead of indexerror because we comparing with self instead of using child, node in while loop
i = c.index(self)
if i == 0:
node = node.parent
if node.parent is None:
raise IndexError('Cannot access the previous element of the first one.')
else:
node = c[i - 1]
break
while True:
try:
node = node.children[-1]
except AttributeError: # A Leaf doesn't have children.
return node
The method with most reduced code size right now is this one
def get_definition(self):
scope = self
while scope.parent is not None:
parent = scope.parent
if scope.isinstance(Node, Name) and parent.type != 'simple_stmt':
if scope.type == 'testlist_comp':
try:
if isinstance(scope.children[1], CompFor):
return scope.children[1]
except IndexError:
pass
scope = parent
else:
break
return scope
And after refactoring
for scope, parent in nwise_iter(chain((self,), self.parents, (None,)), 2):
if (not parent or not isinstance(scope, (Node, Name)) or
parent.type == 'simple_stmt'):
return scope
if parent.type == 'testlist_comp':
next_s = next_sibling(scope)
if isinstance(next_s, CompFor):
return next_s
Almost 50% less code. It's all generator based, so no unnecessary memory consumption.
Talking about slowness of parser.
def assignment_indexes(self):
indexes = []
node = self.parent
compare = self
while node is not None:
if is_node(node, 'testlist_comp', 'testlist_star_expr', 'exprlist'):
for i, child in enumerate(node.children):
if child == compare:
#insert to list is O(n) operation, so this code have quadratic running time.
indexes.insert(0, int(i / 2))
break
else:
raise LookupError("Couldn't find the assignment.")
elif isinstance(node, (ExprStmt, CompFor)):
break
compare = node
node = node.parent
return indexes
If I understand correctly this code had quadratic running time.
def assignment_indexes(self):
"""
Returns an array of tuple(int, node) of the indexes that are used in
tuple assignments.
For example if the name is ``y`` in the following code::
x, (y, z) = 2, ''
would result in ``[(1, xyz_node), (0, yz_node)]``.
"""
def is_list(node):
return is_node(
node, 'testlist_comp', 'testlist_star_expr', 'exprlist'
)
pairs = takewhile(
lambda (ch, p): is_list(p) or not isinstance(p, (ExprStmt, CompFor)),
nwise_iter(chain((self,), self.parents), 2)
)
indices = [(index_in_parent(child, parent) / 2, parent)
for child, parent in pairs if is_list(parent)]
indices.reverse()
return indices
This code on the other hand is just using one reverse call and generators so it's O(n). And it's almost 40% less code. I don't know the semantics of code, so some function names is probably not very good, but still I believe it's more readable.
I will rewrite a few other functions and after that I will try to test performance of old code vs new code. If my changes would make your parser less performant then I would probably abandon my attempt at reducing the code size or it would live in fork only just for my fun. But I doubt that as I only use generators everywhere it's possible. And push this changes with refactoring code would be bad idea indeed obviously, because for now it's completely unrelated. So If all goes well performance wise I will create pull request with refactored code and then another one with refactoring.
Sorry for not answering earlier. I'm very busy at the moment and I find it hard to take out time for Jedi, especially if it's not clearly defined questions. This is a good example: You post a bit of code, but I'm not sure what you actually want me to do. Clearly it's awesome if your code is faster and has less code (if it's still readable). But that's not the point.
If you have something that you would like to get reviewed, just post it in a PR. That would really help. There we could easily discuss the good and bad points. There's just not so much I can say here, because I don't have a very good comparison.
Note: #103 Discussed a lot of things about introducing refactoring. Now that we're a step further, let's discuss the API. This writeup is the result of someone wanting to contribute towards refactoring.
Refactoring
It's been a while, but since I've been ask how to tackle the issue, let's start simple:
The first and easiest refactorings that I'd like to see are:
One of the next steps could be (or basically anything else):
API
I'm suggesting the following API (note that
Script
is given a pointer (line / column index) as well as the path of the file::Of course the bigger work is defining the return class::
Please critize it! I'm very happy for all the feedback, especially if you don't like it. Annotations will of course be removed (and changed to docstrings), because we have to maintain Python 2/3 compatibility.
Testing
Since I have previously started a bit of work on refactoring, I think it would be very beneficial to use that test suite. There's a small list of tests already in
test/refactor/*
(renaming, extract/inline variable). I'm very happy to give pointers about how to use them. Currently they are disabled. Just enable the switch again intest/test_integration.py
.Issues
Currently there's only one known issue (will add to this, if there's more): The parser does error recovery. For now just ignore this, but when we're pushing this to the users, we need to take care of it.
Hints
If you are implementing all of this, you have to use the parser tree's
jedi.parser.tree
logic. If you think that doing something requires a lot of code, you're probably wrong (or the code should just be part of the tree).Generally a refactoring has 3 steps:
rename
should usejedi.Script.usages
;extract
should use no type inference andinline
should work withjedi.Script.goto_definitions
. There's always reasons whyLeaf.value
andLeaf.prefix
and move nodes around (shouldn't cause any issues).Parser.module.get_code()
and profit $$$ !The module
jedi/refactor.py
has a very similar class structure as the one proposed above, but it doesn't work. You can throw away basically anything. It can give you an idea how the structure would look like, but the code is crap. THROW IT AWAY.Please let me know if you need any help to start your work.