ac2cz / Falcon

Amsat Pacsat Groundstation for FalconSat3
9 stars 0 forks source link

ER_BODY_CHECK received while uploading #43

Closed ac2cz closed 5 years ago

ac2cz commented 5 years ago

BX2ABT has a file that won't upload. It gets stuck in a loop. It seems to send the same frame over and over.

Worth checking previous issues, such as #27

His details pasted below: Up til now I have been successful in getting PacSatGroundStation (PSGS) to work on receive. You can go here if you want to see my setup: http://bx2abt.com/main/Falconsat-3 . I can hit FalconSat-3 with a directory request or file download request and get an immediate response and consequent downloads. Uploading is a different story, however. I have been able to get some messages uploaded, but it was more by chance. Right now if I want to upload I seem to get into a loop where PSGS keeps on sending out the same frame (I presume), then disconnects, after which the whole cycle starts over again.

From the PSGS debugging info you can see that I am successful in logging into PFS3-12. My file gets a designated ID (1f14 in this case) and is ready to be received. Then it uploads what I presume is one frame (you can see this in Direwolf where the same chunk of data is send over and over again) after which possibly the reason why things don't work: "ER_BODY_CHECK received while uploading: /home/hans/PacSatGroundData/FalconSat-3/BX2ABT17.txt.out". The only mention of ER_BODY_CHECK I can find on the internet is that it is sent when the checksum on the file body fails. After that the whole sequence starts again, so the message never gets uploaded and out of the queue.

Does this problem lie with FalconSat-3, Direwolf or PGSG, or somewhere else? Where is the checksum being made? This is my first forray into packet radio/PacSat, so I'm not an expert (yet). I don't want to make any (un)educated guesses, thus any help or hints are appreciated. Cheers,

Hans BX2ABT

This is the output (with debugging turned on) of PSGS:

--------BEGIN------- Open ABCD: Ready to upload file: BX2ABT17.txt.out B: 209908947. U>> DEBUG 3: Open: Adding UP LINK Event: UL_CONNECT PB: Empty.. I>>From:PFS3-12 to BX2ABT Ctrl: 0 Type: I Cmd PF: 0 NR: 0 NS: 0 5 2 cc a7 fb 5c 4> SUCCESSFUL LOGIN to PFS3-12 by BX2ABT DEBUG 3: Open: Adding UP LINK Event: From:PFS3-12 to BX2ABT Ctrl: 0 Type: I Cmd PF: 0 NR: 0 NS: 0 5 2 cc a7 fb 5c 4> SUCCESSFUL LOGIN to PFS3-12 by BX2ABT DEBUG 3: Cmd Ok: Adding UP LINK Event: UL_REQUEST_UPLOAD DEBUG 3: Cmd Ok: UL_CMD: From:BX2ABT to PFS3-12 Ctrl: 0 Type: I Cmd PF: 0 NR: 0 NS: 0 8 3 14 1f 0 0 31 3 0 0 UL CMD: UPLOAD_CMD File: 1f14 Size: 331 S>> I>>From:PFS3-12 to BX2ABT Ctrl: 22 Type: I Cmd PF: 0 NR: 1 NS: 1 8 4 14 1f 0 0 31 3 0 0> Ready to receive file: 1f14 from BX2ABT at off: 817 DEBUG 3: Waiting: Adding UP LINK Event: From:PFS3-12 to BX2ABT Ctrl: 22 Type: I Cmd PF: 0 NR: 1 NS: 1 8 4 14 1f 0 0 31 3 0 0> Ready to receive file: 1f14 from BX2ABT at off: 817 DEBUG 3: Waiting: UL_GO_RESP: From:PFS3-12 to BX2ABT Ctrl: 22 Type: I Cmd PF: 0 NR: 1 NS: 1 8 4 14 1f 0 0 31 3 0 0> Ready to receive file: 1f14 from BX2ABT at off: 817 DEBUG 3: Waiting: GO FILE>From:PFS3-12 to BX2ABT Ctrl: 22 Type: I Cmd PF: 0 NR: 1 NS: 1 8 4 14 1f 0 0 31 3 0 0> Ready to receive file: 1f14 from BX2ABT at off: 817 DEBUG 3: Data: Adding UP LINK Event: UL_DATA_END DEBUG 3: Data: UL_DATA_END: UL_DATA_END S>> I>>From:PFS3-12 to BX2ABT Ctrl: 44 Type: I Cmd PF: 0 NR: 2 NS: 2 1 7 10> UL NAK 16: ER_BODY_CHECK DEBUG 3: Data End: Adding UP LINK Event: From:PFS3-12 to BX2ABT Ctrl: 44 Type: I Cmd PF: 0 NR: 2 NS: 2 1 7 10> UL NAK 16: ER_BODY_CHECK DEBUG 3: Data End: UL_NAK_RESP: From:PFS3-12 to BX2ABT Ctrl: 44 Type: I Cmd PF: 0 NR: 2 NS: 2 1 7 10> UL NAK 16: ER_BODY_CHECK NAK: ER_BODY_CHECK received while uploading: /home/hans/PacSatGroundData/FalconSat-3/BX2ABT17.txt.out

TIME-1: PHT: uptime is 469/13:51:30. Time is Sat Jun 08 12:19:29 2019. U>> DEBUG 3: Idle: Adding UP LINK Event: UL_DISCONNECTED DEBUG 3: Idle: Adding UP LINK Event: Open ABCD: Open ABCD: B: 209908947. PB: Empty.. ---------END-------

This is what Direwolf's output is:

--------BEGIN------- PFS3-1 audio level = 68(+118/-110) [NONE] ||||||||| [0.4] PFS3-1>LSTAT:I P:0x13A8 o:0 l:31473 f:31529, d:1 st:4 e:26<0x0d> Unknown APRS Data Type Indicator "I", Ambulance [0L] BX2ABT>PFS3-12:(SABM cmd, p=1)

PFS3-12 audio level = 69(+115/-111) [NONE] ||||||||| [0.4] PFS3-12>BX2ABT:(UA res, f=0)

PFS3-12 audio level = 70(+119/-110) [NONE] ||||||||| [0.4] PFS3-12>BX2ABT:(I cmd, n(s)=0, n(r)=0, p=0, pid=0xf0)<0x05><0x02>^<0xa8><0xfb>\<0x04> [0L] BX2ABT>PFS3-12:(I cmd, n(s)=16, n(r)=120, p=0, pid=0x08)<0x08><0x03><0x14><0x1f><0x00><0x00>1<0x03><0x00><0x00>

PFS3-12 audio level = 70(+121/-110) [NONE] ||||||||| [0.4] PFS3-12>BX2ABT:(RR res, n(r)=1, f=0)

PFS3-12 audio level = 70(+121/-109) [NONE] ||||||||| [0.4] PFS3-12>BX2ABT:(I cmd, n(s)=17, n(r)=120, p=0, pid=0x08)<0x04><0x14><0x1f><0x00><0x00><0xfe><0x00><0x00><0x00> [0L] BX2ABT>PFS3-12:(I cmd, n(s)=1, n(r)=2, p=0, pid=0xf0)<0xfe><0x00> the internet server that you sent me a message on April 11, but unfortunately I never was able to download it. It took a while to get the transmitting part working, but now it does and I'm very pleased to be able to contact you via FS-3. Truly an amazin [0L] BX2ABT>PFS3-12:(RR cmd, n(r)=2, p=1)

PFS3-11 audio level = 65(+118/-109) [NONE] ||||||||| [0.4] PFS3-11>PBLIST:PB: Empty.<0x0d> Unknown APRS Data Type Indicator "P", Original Balloon (think Ham b

PFS3-12 audio level = 66(+117/-109) [NONE] ||||||||| [0.4] PFS3-12>BX2ABT:(RR res, n(r)=1, f=1) [0L] BX2ABT>PFS3-12:(I cmd, n(s)=33, n(r)=120, p=0, pid=0x08)<0x08><0x03><0x00><0x00><0x00><0x00>1<0x03><0x00><0x00>

PFS3-12 audio level = 74(+125/-111) [NONE] ||||||||| [0.4] PFS3-12>BX2ABT:(DISC cmd, p=1) [0L] BX2ABT>PFS3-12:(UA res, f=1) --------END-------

K4KDR commented 5 years ago

Just so that it doesn't appear that Hans has an isolated incident specific to his setup, I have experienced the same issue. Unfortunately I don't have much to contribute in the way of details that might help identify cause, but after initially having no problems with BBS file uploads, suddenly when I would queue a small plain text upload, it would never transmit. This has been some time ago so I apologize for not remembering every detail. However, if a fix is not identified I'll be glad to switch back from WiSP to this app again and attempt more uploads. -Scott, K4KDR

ac2cz commented 5 years ago

Thanks Scott,

I'm looking into this. Do small files work and large ones fail? Or do all files now fail? Did I break it with an update?

Chris

On Mon, Jun 24, 2019, 17:41 K4KDR notifications@github.com wrote:

Just so that it doesn't appear that Hans has an isolated incident specific to his setup, I have experienced the same issue. Unfortunately I don't have much to contribute in the way of details that might help identify cause, but after initially having no problems with BBS file uploads, suddenly when I would queue a small plain text upload, it would never transmit. This has been some time ago so I apologize for not remembering every detail. However, if a fix is not identified I'll be glad to switch back from WiSP to this app again and attempt more uploads. -Scott, K4KDR

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/ac2cz/Falcon/issues/43?email_source=notifications&email_token=ADRDKWMIZLZUUYQSEUG6EODP4E5PZA5CNFSM4H3CNPJKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODYOKCWI#issuecomment-505192793, or mute the thread https://github.com/notifications/unsubscribe-auth/ADRDKWK4H7FISZ2KM56R6RDP4E5PZANCNFSM4H3CNPJA .

K4KDR commented 5 years ago

I only attempted small uploads.

I do regret that it has been some time since I observed this and for the sake of quickly getting some messages uploaded, reverted to WiSP. Looking at my Windows computer where the app is installed, I will see if I can attach the last message-related files and log files from in & around that date. (.TXT extension added for github uploader)

PacSatGround20190408.log PacSatGround20190407.log K4KDR13.txt..20190407.1017.err.txt K4KDR15.txt..20190407.1016.ul.txt PacSatGround20190406.log

ghost commented 5 years ago

20190608-fs3-psgs_output.txt 20190608-fs3-direwolf_output.txt BX2ABT17.txt..20190607.1227.err.txt

Hello Chris and Scott, Here three files I saved on June 8, one with the output of PSGS with debug turned on, the other with (most of) the output of Direwolf. The third file is the message, which was rejected this time, but most of the time it simply stays in queue.

The last messages I got uploaded were very short, mostly one-liners, so it might be that the problem lies only with longer messages. HTH. 73 de Hans (BX2ABT)

ac2cz commented 5 years ago

Ok thanks. I have been investigating this.

First question, are you running the latest version 0.22? The error message in the printout supplied is from an older version. But perhaps that is an error in how I built 0.22. But the error message does not matter too much, as it is not causing the issue.

Second question is if the upload file gets renamed to have a .err ending? It looks like it does not and the program just keeps trying to upload it. That is perhaps because I keep the file open accidentally and it can not be renamed. I can fix that potential bug. But with that said, perhaps we don't want to rename it, as I will explain below.

In the situation shown, the file is length 331 (in hex) or 817 bytes in decimal. When you request that it be uploaded you already have the File id of 1f14 so you are trying to continue an upload. FS-3 tells you to continue from offset 817, which is the end of the file. I presume that happens when we upload an entire file but then fail to send the DATA_END command. The spec says that if the offset is greater than or equal to the file size then we send DATA_END at that point. So that is what PSG does. FS-3 responds with ER_BODY_CHECK, meaning the file is corrupt.

So perhaps on of the following is going on: A/ I have an error in the way the BODY CHECKSUM is calculated, but then why would it work for some files and not others? B/ The file is actually corrupted, sitting on the spacecraft some bits have been flipped by radiation. So it fails. In that case we should not mark the file as ERR or retry as before. We should instead reset the File Id to 0 and start again as though it is a new file. C/ All continuations are failing because I have a bug in the logic. So only files that upload first time completely work.

ghost commented 5 years ago

Hello Chris,

On 06/26/2019 11:23 AM, Chris Thompson wrote:

Ok thanks. I have been investigating this.

Thank you. I know you are busy and this is a pet project, so any time you spend is appreciated. My only reason for pointing out problems is to help you debug and I also learn about it myself. So any side explanation you can give is also very much appreciated.

First question, are you running the latest version 0.22? The error message in the printout supplied is from an older version. But perhaps that is an error in how I built 0.22. But the error message does not matter too much, as it is not causing the issue.

0.21 wouldn't run on my Kubuntu system, so you fixed that (I believe last April or May). Only 0.22 will run on my system.

Second question is if the upload file gets renamed to have a .err ending? It looks like it does not and the program just keeps trying to upload it. That is perhaps because I keep the file open accidentally and it can not be renamed. I can fix that potential bug. But with that said, perhaps we don't want to rename it, as I will explain below.

I included this file because it was the one used in for that session. Surprisingly it was renamed with the .err extension, while other previous uploads simply remained in queue.

In the situation shown, the file is length 331 (in hex) or 817 bytes in decimal. When you request that it be uploaded you already have the File id of 1f14 so you are trying to continue an upload. FS-3 tells you to continue from offset 817, which is the end of the file. I presume that happens when we upload an entire file but then fail to send the DATA_END command. The spec says that if the offset is greater than or equal to the file size then we send DATA_END at that point. So that is what PSG does. FS-3 responds with ER_BODY_CHECK, meaning the file is corrupt.

So perhaps on of the following is going on: A/ I have an error in the way the BODY CHECKSUM is calculated, but then why would it work for some files and not others?

Is it because shorter files only require one frame and larger filed more than one frame? What you can see from my Direwolf output is that the same frame is being transmitted over and over again, as if it is not accepted. Packet newbie here, so ignore if it doesn't make sense at all :-)

B/ The file is actually corrupted, sitting on the spacecraft some bits have been flipped by radiation. So it fails. In that case we should not mark the file as ERR or retry as before. We should instead reset the File Id to 0 and start again as though it is a new file. C/ All continuations are failing because I have a bug in the logic. So only files that upload first time completely work.

D/ the combination PSGS and Direwolf messes up things. Although its KISS, so why?

I switched shack PCs and started with a fresh installation of Kubuntu 19.04 and PSGS 0.22. Put the directory update to 3 days, but got a lot of holes now (21 holes with age of 9 days), so right now PSGS doesn't even come to uploading until those holes are filled, constantly putting out DL requests for those holes. You already wrote that this is due to FS-3 giving out new time stamps, which is quirky behaviour of FS-3. Isn't there a way in which to ignore those holes and prioritize other tasks? E.g. you can get DL requests for files automatically and manually, which are met because they apparently take priority over directory hole fill requests. Just a thought.

73 de Hans BX2ABT

ac2cz commented 5 years ago

Each upload packet only has 254 data bytes in it and the first packet is nearly completely filled with the pacsat header. So even 1 or 2 line messages spread across multiple packets. So messages with multiple packets work.

If we don't upload in a single pass then we have a "continuation". On the next pass FS-3 tells us the byte offset to continue from. I have trawled through that logic and run some tests. It looks to be fine.

I have fixed the logic for renaming to ERR so that should work reliably when we want it to. But I have also decided that ERR_BODY_CHECK and similar errors are recoverable. So I will not rename the file and will instead reset the file id to 0 and then the file will be retried from the beginning with all of the data being uploaded again. I think that is the correct response when the data is corrupted.

The packet you see being sent over and over is the DATA_END packet. That is the packet that we send if all the DATA packets have been sent. When FS-3 asks for data beyond the end of the file the correct response is DATA_END. That causes FS-3 to calc the check-sums and the BODY CHECKSUM fails. My calculation is good (its not hard) so we have to assume the data was uploaded incorrectly or corrupted on the server (either by radiation or in transit somehow. Packet it nearly perfect but can have errors - hence the need for the checksum in the first place.) When we fail PSG just tries again next time with clearly the same result. So we are stuck in a loop. So that is what I have changed by resetting the file id to 0.

These fixes are in 0.23 which will be on my website shortly.

It also fixes a bug where the AGE of the directory, as specified on the Spacecraft/FaclconSat-3 window, was not being honoured when holes are requested. So we are asking for too many directory holes and compounding our issues with old files that change!

ghost commented 5 years ago

Hello Chris. Just had a good pass of FS-3 and v0.24 seems to have fixed this issue. Was successful in uploading two messages during the pass. The issue with age of the directory listing is also fixed. Set the age at 5 days and it did indeed only download the last five days of headers. Great work! Thanks a bunch. See you on FS-3.

ac2cz commented 5 years ago

I will close this. There is a new issues to track the open dir holes