Proxmark / proxmark3

Proxmark 3
http://www.proxmark.org/
GNU General Public License v2.0
3.17k stars 910 forks source link

Inconsistent Serial Communication with PM3 on macOS #283

Closed digitalentropy closed 6 years ago

digitalentropy commented 7 years ago

Okay, this is a bit of a harder issue to describe, and I've been trying to figure out a good way to document it before I opened this issue, but have been unsuccessful so far. Perhaps I can borrow a USB protocol analyzer from a colleague and see if I get any good data that way.

For quite some time now, about two years at least, whenever I've compiled the client on macOS I've been observing what seems to be an issue with the client going "out of sync". When I first open the client, the first two or three commands execute fine, but with each successive command, the responses take longer and longer and longer, until I'm waiting upwards of 10 - 15 seconds for something as simple as "lf search" or to finish. The delay appears to apply to all commands, both LF and HF. My current "workaround" is to constantly exit and restart the pm3 client every few commands, which is less than ideal.

It could be an issue with a library being used on macOS since I've never observed this when compiling on other platforms or within a VM.

With the latest code versions the problem has gotten particularly bad which is what prompted this ticket, so I will do my best to document more and isolate.

digitalentropy commented 7 years ago

Just a follow-up.

I did a quick test using only "lf search" as a test command, and found that as long as I keep running commands in quick succession, with minimal time in between, I don't observe any immediate problems.

If, however, I run one command, then wait five seconds and run it again, there's a huge delay. The longer I wait, the worse the delay is.

marshmellow42 commented 7 years ago

Well there isn't much simple about the lf or hf search functions. Can you confirm with a specific tag read cmd? Just to narrow the scope a little.

digitalentropy commented 7 years ago

Yeah, I was thinking the same thing even as I was typing it. I know it's a basically a fancy script. :-)

It happens with all commands, even individual tag commands, independent of LF or HF. I'm currently cleaning up my dev environment and starting from scratch to see if I can narrow it down to a specific library version.

digitalentropy commented 7 years ago

Okay, I started from a clean slate and started with just the homebrew tap. Same problem. Now testing it by manually building from source using homebrew libraries.

digitalentropy commented 7 years ago

Finished compiling manually using homebrew instructions (no tap).

Same problem. The test command I'm using now is "lf hid read 1". When first opening the client, the command takes less than a second to execute. After waiting 30 seconds and running the command again, it takes 6 seconds to execute and get response back.

Wait a bit more, it's up to 8 seconds.

digitalentropy commented 7 years ago

Same problem using only MacPorts.

"lf hid read 1" took 8 seconds before I got a response back.

As an added bonus, the version I compiled using MacPorts segfaults (code 11) when exiting normally.

iceman1001 commented 7 years ago

hm, I've heard about this behavior before but that was related with using the pm3 gui and its logfile?

However in iceman fork I did some serial communication changes. Would you mind testing it and see if you still have the same slowdowns? You need to pull latest, compile, flash, before doing your tests.

digitalentropy commented 7 years ago

Attempted to compile but it doesn't like something about my readline library. I don't get the same error with the current pm3 main: ld: warning: directory not found for option '-L/usr/local/opt/readline/lib' Undefined symbols for architecture x86_64: "_rl_copy_text", referenced from: _PrintAndLog in ui.o "_rl_readline_state", referenced from: _PrintAndLog in ui.o "_rl_replace_line", referenced from: _PrintAndLog in ui.o "_rl_restore_prompt", referenced from: _PrintAndLog in ui.o "_rl_save_prompt", referenced from: _PrintAndLog in ui.o ld: symbol(s) not found for architecture x86_64 clang: error: linker command failed with exit code 1 (use -v to see invocation) make[1]: *** [proxmark3] Error 1 make: *** [client/all] Error 2

digitalentropy commented 7 years ago

Iceman, your fork doesn't compile with macports libraries but works fine with homebrew. Switched back to homebrew for now (i prefer it anyway).

digitalentropy commented 7 years ago

Iceman, got your fork to compile but I can't flash it for some reason...

Loading ELF file '../bootrom/obj/bootrom.elf'...
Loading usable ELF segments:
0: V 0x00100000 P 0x00100000 (0x00000200->0x00000200) [R X] @0x94
1: V 0x00200000 P 0x00100200 (0x00000c00->0x00000c00) [RWX] @0x298

Waiting for Proxmark to appear on /dev/cu.usbmodem1411. Found.
Error: Unexpected reply 0x00fe NACK (expected ACK)
marshmellow42 commented 7 years ago

try using the normal flasher from the master with his elf.

digitalentropy commented 7 years ago

Thanks for the LPT, @marshmellow42. Successfully flashed.

@iceman1001, your fork does indeed appear to work more reliably as far as client-device communication is concerned. Quite reliable, in-fact. It also doesn't segfault/crash when I exit.

iceman1001 commented 7 years ago
marshmellow42 commented 7 years ago

I had the segfault error on Linux before my changes to the graph window. I don't think it is related

iceman1001 commented 7 years ago

I had another "qt exception" error before on Ubuntu, but now after your new stuff, I get a segmentation error but the old error is gone. So for me that is related :)

marshmellow42 commented 7 years ago

In my testing on Kali and Ubuntu before my changes sometimes I'd get the segfault and sometimes the qt exception. I fixed the exception (threading issue) but couldn't find the segfault.

marshmellow42 commented 7 years ago

But to the problem at hand, it sounds like we can work a patch for the master, but I'm curious if the USB differences you have are the cause of the changing USB ports at flashing with your fork or if it is the flasher changes you made.

iceman1001 commented 7 years ago

Its not a major issue to me. Qt exception or segfault, same same but different.

I've done some changes to USB serial configuration, and some changes did another user. No idea why the comports insists of changing, I thought it was a thing in my setup.

marshmellow42 commented 7 years ago

i know this isn't exactly the topic, but the com port # change on icemans fork might is likely due to the serialnumber for the usb device is defined differently on icemans fork than on the master. so anyone coming from ( or visa versa) a master bootrom (or arm) to icemans will have the port # change on them

now i will look at potential improvements that i can include in the master since i feel i can avoid this issue.

digitalentropy commented 7 years ago

More on topic, I will revise my earlier comment and say that iceman’s fork is far more stable from a client communication perspective, but I’ve had a couple instances where I’m still observing unexplainable delays.

On Apr 22, 2017, at 7:01 PM, marshmellow42 notifications@github.com wrote:

i know this isn't exactly the topic, but the com port # change on icemans fork might is likely due to the serialnumber for the usb device is defined differently on icemans fork than on the master. so anyone coming from ( or visa versa) a master bootrom (or arm) to icemans will have the port # change on them

now i will look at potential improvements that i can include in the master since i feel i can avoid this issue.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/Proxmark/proxmark3/issues/283#issuecomment-296414088, or mute the thread https://github.com/notifications/unsubscribe-auth/ABWMvpVATcB1fbTQrVAXeR5Qar6dOeTLks5ryrEFgaJpZM4NDc-x.

iceman1001 commented 7 years ago

...and now I need to see where in the code the USB serial number is created.
is it in common/usbcdc.c ? Some changes by brigato was made to make it more stabil with STM controllers. He mentioned something of invalid values. You should find those changes by comparing with PM3 master.

pwpiwi commented 7 years ago

I have created PR #287 which contains some of the changes in @iceman1001's fork plus some more.

However, how do we know that the issue is related to USB communication at all?

iceman1001 commented 7 years ago

You don't. You only know that iceman fork has more stable communications between device & client. There is also a lot of minor fixes in iceman fork regarding the client which also could have been the source of problems. The usb fixes you just made a PR for solves instances where the OS doesn't like the newly connected device, for some embedded controllers (STM) these changes are needed to make it work. Its a good fix nevertheless.

marshmellow42 commented 7 years ago

However, if all cmds, lf and hf, exhibit the same issue, per op, then what else could it be?

Btw, good work pwpiwi, you beat me to it and did a better job. I was also interested in the uart changes, what do you think?

iceman1001 commented 7 years ago

the uart.c changes makes my client get much fast usb transfer speeds on my mingw env. Before its really slow, with them its quite good. I usually got usb-failures with timeouts on mingw env then. Never since. Don't think I experienced so clear with my Ubuntu env like on the mingw env.

pwpiwi commented 7 years ago

I am quite sure that the settings are ignored at least on Windows. I am getting transfer speeds of around 500kByte/s (hw status), while the setting is 9600Bit/s for Windows in uart.c. What do you get?

iceman1001 commented 7 years ago

I use virtual machines with vmware. 14000b/s without changes 400kB/s with changes

marshmellow42 commented 7 years ago

Sorry i miss-spoke (was on my phone and didn't look it up first). what i meant to say is i was interested in the UDP_CLEAR_EP_FLAGS, UDP_SET_EP_FLAGS macros and their use as they appear to clear or check more status bits than the current implementation. but i'm no expert here.

iceman1001 commented 7 years ago

ok, you mean the macros, well, I was looking at usb_cdc implementations and in modern implementations of that file (ours is a few years old) they use these kinds of macros to set/clear bits instead of the old "while" statements.

pwpiwi commented 7 years ago

I had a closer look at those and indeed they could avoid some issues. This is because the simple reg &= flags also reads and writes the bits which are not affected by flags. And writing a bit, even if the value isn't changed, can have an effect. On the other hand, writing a 1 into a bit which has been read as 0 doesn't have an effect for those bits defined in REG_NO_EFFECT_1_ALL.

Unfortunately all those changes show no effect in my test environments (Kali, Arch, mingw). @digitalentropy: Does it make a difference for you?

marshmellow42 commented 7 years ago

@digitalentropy, please test pull request 287 by pwpiwi, and let us know if the issues you experience on osx are better. (see https://help.github.com/articles/checking-out-pull-requests-locally/ )

Thanks pwpiwi!

iceman1001 commented 7 years ago

Some feedback for OSX users, https://github.com/Proxmark/proxmark3/pull/287#issuecomment-297811764

digitalentropy commented 7 years ago

@iceman1001 @pwpiwi Thanks for trying to look into this. I've tested #287 and it disappointingly it hasn't appeared to help. I've also reviewed and double-checked the comments from @cjbrigato regarding various serial drivers, but after looking through the results of kextstat I have found no non-Apple Serial or CDC drivers loaded.

It is curious to me that some others do not appear to have this issue, and I am now wondering what I can do to investigate further. Does anyone happen to know if there is a clean way of snooping on a specific serial device that could shed light on what's going on?

digitalentropy commented 7 years ago

Alright, I have more useful data to share. While I distinctly recall running into this issue well before that, I do remember a time where I had isolated it to the version of readline library being used, and had ameliorated it by switching to macports from homebrew. In fact, using macports has been the only way of consistently getting all old and new versions to compile in the same environment, so for the moment since I'm not using homebrew for anything I've switched completely to macports.

Now...

I've tested everything prior to 2.3.0 and including 2.3.0 with my current build environment, and have found no unusual behavior in the client. However the current master has major, major issues. So whatever the culprit is, it popped up during the last year. I realize that's not a super-narrow window but it's what I have at the moment. I can reproduce it extremely consistently by switching between 2.3.0 and the current version.

Later tonight I'll see if I can narrow the window further and see what's going on.

digitalentropy commented 7 years ago

I did some more investigating post-dinner tonight and I am suspecting that as before, the issue with inconsistencies between various libraries.

When I was having issues before, in my case it was because the version of readline homebrew installed was somehow different than what macports installed. If I was running homebrew libraries, everything compiled fine but my client would constantly have comm issues with the pm3. If I had macports installed, those delay issues appeared to go away.

When testing recent builds tonight I noticed I was only observing the most offending of delays whenever it was compiled with the GUI. Suspecting issues with qt, I just installed whatever version of qt5 that macports installed and recompiled.

Apparently I had some other qt library installed (is there one that comes with xcode?) as well because it still compiled with the GUI but this time, it appeared to be far more stable. I haven't observed the ever-increasing-delay in command execution that I've been observing quite severely recently.

I still get a segfault when exiting, but at least communication appears to be more consistent. I do wish it wasn't so unclear when there are library conflicts. Generally speaking macports is pretty dirty and I would much prefer to be able to just stick with homebrew but I've had very inconsistent experiences with it and the proxmark3 code.

As a follow-up test I may do a clean install of 10.12.4 onto an SD card and see if I get similar behavior on fresh install.

digitalentropy commented 7 years ago

Is there a flag I can set to compile the client without GUI? Would be useful for testing.

cjbrigato commented 7 years ago

Just by curiosity I voluntarily made a "bad and conflicting" qt installation on an osx (by messing with multiple version on multiple homebrew + macports) in order to reproduce and it seams that the problem is more a client misbehavior than a USB communication misbehavior. In such a specific case, compiling "without" the GUI (in fact with a dummygui with nothing about qt or anything, but you could avoid the gui compilation via makefile edition or anything else too) solved the problem. Anyway try to clean your lib's and to not mess too much with not well managed homebrew or macports, and if possible avoid mixing macports and homebrew.

digitalentropy commented 7 years ago

I agree, and thank you for going through the trouble of making an additional data point.

What edit did you make to the Makefile to compile without GUI? I'll have to see if it's possible to resolve by manually finding and cleaning qt libs.

On May 13, 2017, at 10:07 AM, Colin J. Brigato notifications@github.com<mailto:notifications@github.com> wrote:

Just by curiosity I voluntarily made a "bad and conflicting" qt installation on an osx (by messing with multiple version on multiple homebrew + macports) in order to reproduce and it seams that the problem is more a client misbehavior than a USB communication misbehavior. In such a specific case, compiling "without" the GUI (in fact with a dummygui with nothing about qt or anything, but you could avoid the gui compilation via makefile edition or anything else too) solved the problem. Anyway try to clean your lib's and to not mess too much with not well managed homebrew or macports, and if possible avoid mixing macports and homebrew.

- You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/Proxmark/proxmark3/issues/283#issuecomment-301250614, or mute the threadhttps://github.com/notifications/unsubscribe-auth/ABWMvojM4PM8RkyyUXP48zjUlVR5qBlxks5r5bkwgaJpZM4NDc-x.

pwpiwi commented 7 years ago

If you want to compile without GUI, just comment out the GUI-detection part in client/Makefile:

# Check for correctly configured Qt5
# QTINCLUDES = $(shell pkg-config --cflags Qt5Core Qt5Widgets 2>/dev/null)
# QTLDLIBS = $(shell pkg-config --libs Qt5Core Qt5Widgets 2>/dev/null)
# MOC = $(shell pkg-config --variable=host_bins Qt5Core)/moc
# UIC = $(shell pkg-config --variable=host_bins Qt5Core)/uic
# ifeq ($(QTINCLUDES), )
# # if Qt5 not found check for correctly configured Qt4 
    # QTINCLUDES = $(shell pkg-config --cflags QtCore QtGui 2>/dev/null)
    # QTLDLIBS = $(shell pkg-config --libs QtCore QtGui 2>/dev/null)
    # MOC = $(shell pkg-config --variable=moc_location QtCore)
    # UIC = $(shell pkg-config --variable=uic_location QtCore)
# else
    # CXXFLAGS += -std=c++11 -fPIC
# endif
# ifeq ($(QTINCLUDES), )
# # if both pkg-config commands failed, search in common places
    # ifneq ($(QTDIR_), )
        # QTINCLUDES = -I$(QTDIR)/include -I$(QTDIR)/include/QtCore -I$(QTDIR)/include/QtGui
        # QTLDLIBS = -L$(QTDIR)/lib -lQtCore4 -lQtGui4
        # ifneq ($(wildcard $(QTDIR)/include/QtWidgets),)
            # QTINCLUDES += -I$(QTDIR)/include/QtWidgets
            # QTLDLIBS = -L$(QTDIR)/lib -lQt5Widgets -lQt5Gui -lQt5Core
            # CXXFLAGS += -std=c++11 -fPIC
        # endif
        # MOC = $(QTDIR)/bin/moc
        # UIC = $(QTDIR)/bin/uic
    # endif
# endif
iceman1001 commented 7 years ago

I think after all latest fixes to the dev-env, as nice clean build via homebrew should be working. The original issue with serial comm problems, should not be an issue anymore.

Time to close?

iceman1001 commented 7 years ago

@digitalentropy how is your serial communications going when you use release v3.0.1 ?

pwpiwi commented 7 years ago

Is this still an issue?

digitalentropy commented 7 years ago

Hey Pwipwi,

I haven’t had a chance to re-test due to work schedules, but will do so soon. The plan is to test from a completely clean install instead of from an old environment.

From: pwpiwi [mailto:notifications@github.com] Sent: Tuesday, July 4, 2017 12:04 PM To: Proxmark/proxmark3 proxmark3@noreply.github.com Cc: Babak Javadi omikron@brokenpixel.net; Mention mention@noreply.github.com Subject: Re: [Proxmark/proxmark3] Inconsistent Serial Communication with PM3 on macOS (#283)

Is this still an issue?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/Proxmark/proxmark3/issues/283#issuecomment-312911596, or mute the threadhttps://github.com/notifications/unsubscribe-auth/ABWMvoLzMlps-wCoET0TrzIU26PiW_Voks5sKmJ-gaJpZM4NDc-x.

pwpiwi commented 7 years ago

Any news?

masterix21 commented 7 years ago

i can confirm the QT bug that cause lag to every response: i found it several months ago. Tried only with macOS Sierra.

digitalentropy commented 7 years ago

I can also confirm that the bug still exists. It was observed on two clean installs of macOS last week.

On Sep 19, 2017, at 6:20 AM, Luca Longo notifications@github.com<mailto:notifications@github.com> wrote:

i can confirm the QT bug that cause lag to every response: i found it several months ago. Tried only with macOS Sierra.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/Proxmark/proxmark3/issues/283#issuecomment-330536726, or mute the threadhttps://github.com/notifications/unsubscribe-auth/ABWMvvkeCsDYS1O481dVhMNRbGXdLJt7ks5sj7-vgaJpZM4NDc-x.

broncotc commented 7 years ago

my lf search also cause segfault, I'm on linux, debian testing

jmichelp commented 7 years ago

I took protocol analyzer to traceback this annoying bug and it seems that the PM3 has a lots of timeouts (SOF / NAK periods) which, after a while, seem to be followed by CDC commands to set the line at 9600 bauds. Also, a lot of 0x00-filled buffers are being transferred on EP2 (CDC IN). Not sure if this is expected behavior or not.

Not sure if this is the root cause, I would have to compare the dump with Linux for example.

iceman1001 commented 7 years ago

hm... @jmichelp very interesting since I'm fiddling with the usb_cdc implementation aswell. You got an email I can write to?

micolous commented 7 years ago

I can also see this behaviour just plugging in the device on a Linux machine, it takes 10 seconds to actually come up as a device:

[14646.116202] usb 3-13.2: new full-speed USB device number 63 using xhci_hcd
[14651.188644] usb 3-13.2: device descriptor read/64, error -110
[14656.383699] usb 3-13.2: New USB device found, idVendor=9ac4, idProduct=4b8f
[14656.383707] usb 3-13.2: New USB device strings: Mfr=1, Product=2, SerialNumber=0
[14656.383711] usb 3-13.2: Product: PM3
[14656.383715] usb 3-13.2: Manufacturer: proxmark.org
[14656.384011] usb 3-13.2: ep 0x83 - rounding interval to 1024 microframes, ep desc says 2040 microframes
[14656.384606] cdc_acm 3-13.2:1.0: ttyACM0: USB ACM device

I captured at the same time with Wireshark, and saw that it took 5 seconds to respond to a GetDescriptorString request, for Index 0x02 (the product name), language 0x0409.